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1  Introduction 


Object  recognition  is  a  central  problem  in  computer  vision,  and  model-based  meth¬ 
ods  constitute  one  prevalent  approach  to  this  problem.  In  the  model-based  ap¬ 
proach,  a  set  of  geometric  features  that  constitute  a  model  of  an  object  are  compared 
against  like  features  that  have  been  extracted  from  an  image  of  a  scene  (cf.  [3,  7]). 
The  process  of  comparing  a  model  with  an  image  generally  involves  determining  a 
valid  correspondence  between  a  subset  of  the  model  features  and  a  subset  of  the 
features  found  in  the  image.  In  order  for  such  a  correspondence  to  be  valid,  it  is 
usually  required  that  there  exist  some  transformation  of  a  given  type  mapping  each 
model  feature  onto  its  corresponding  image  feature.  This  transformation  generally 
specifies  the  pose  of  the  object  -  its  position  and  orientation  with  respect  to  the 
image  coordinate  system.  The  quality  of  a  given  hypothesized  transformation  is 
then  evaluated  based  on  the  number  of  model  features  that  are  brought  into  corre¬ 
spondence  with  image  features.  Thus  the  task  of  model-based  recognition  can  be 
viewed  as  finding  legal  transformations  from  a  model  to  an  image,  and  then  de¬ 
termining  whether  one  or  more  of  these  transformations  accounts  for  a  sufficiently 
large  portion  of  the  model  and  the  observed  data. 

A  number  of  recent  model-based  recognition  systems  have  used  affine  transfor¬ 
mations  of  the  plane  to  represent  the  mapping  from  a  two-dimensional  model  to 
a  two-dimensional  image  (e.g.  [4,  5,  12,  13,  16,  17,  18,  19,  21,  22]).  This  type 
of  transformation  can  be  used  to  approximate  the  two-dimensional  image  of  a  flat 
(planar)  object  at  an  arbitrary  orientation  in  three-dimensional  space.  The  trans- 
foK'  »tion  is  equivalent  to  a  three-dimensional  rigid  motion  of  the  object,  followed 
by  orthographic  projection  and  scaling  (dilation).  The  scale  factor  accounts  for  the 
fact  that  objects  which  are  farther  away  appear  smaller  than  those  which  are  close. 
This  affine  viewing  model  does  not  capture  the  perspective  distortions  that  occur 
in  real  camera  systems,  because  affine  transformations  preserve  parallelism.  It  is  a 
relatively  good  approximation  to  perspective  except  when  an  object  is  deep  with 
respect  to  its  distance  from  the  viewer  (e.g.,  railroad  tracks  going  off  to  the  horizon). 

Recognition  systems  that  make  use  of  two-dimensional  affine  transformations 
fall  into  two  basic  classes.  Methods  in  the  first  class  explicitly  compute  an  affine 
transformation  based  on  the  correspondence  of  a  set  of  ‘basis  features’  in  the  image 
and  the  model.  This  transformation  is  applied  to  the  remaining  model  features 
in  order  to  map  them  into  the  image  coordinate  frame,  where  they  are  compared 
with  image  features  [2,  12,  13,  21].  Methods  in  the  second  class  compute  affine 
invariant  representations  of  the  model  and  the  image,  and  directly  compare  these 
invariant  representations  [4,  5, 16, 17, 18,  19, 22].  In  either  case,  recognition  systems 
that  employ  affine  transformations  generally  do  not  explicitly  account  for  sensory 
uncertainty,  but  rather  use  some  heuristic  means  to  allow  for  uncertainty  in  the 
location  of  sensory  data  (one  notable  exception  is  [4]  who  formulate  a  probabilistic 
method).  In  this  paper  we  provide  a  precise  account  of  how  uncertainty  in  the  image 
measurements  affects  the  range  of  transformations  that  are  consistent  with  a  given 
configuration  of  points  acting  under  an  affine  transformation.  This  is  important 
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both  for  analyzing  current  recognition  methods  that  employ  affine  transformations, 
and  for  developing  new  recognition  methods  that  explicitly  account  for  uncertainty. 

We  use  our  formal  model  of  approximate  affine  matching  to  analyze  two  recog¬ 
nition  methods  that  employ  affine  transformations:  the  geometric  hashing  method 
[16,  17,  18,  19,  22]  and  the  alignment  method  [12,  13].  Under  our  model  each  sensor 
location  is  represented  as  an  uncertainty  disc  of  radius  e,  rather  than  as  a  specific 
(i,2/)  location.  For  the  alignment  method,  we  use  this  model  to  provide  a  precise 
expression  for  the  range  of  image  point  configurations  that  are  consistent  with  a 
given  quadruple  of  model  points  acting  under  an  affine  transformation.  That  is, 
we  characterize  what  image  points  can  match  a  given  model  point  for  a  particular 
model  and  image  basis  (coordinate  frame).  This  determines  the  range  of  possible 
matches  for  features,  and  hence  the  range  of  possible  solutions  to  the  recognition 
problem.  For  the  geometric  hashing  method,  we  provide  a  similar  analysis  for  the 
range  of  affine-invariant  coordinates  that  are  consistent  with  a  given  quadruple  of 
points.  This  analysis  reveals  that  when  there  is  uncertainty  in  the  data,  the  geo¬ 
metric  hashing  method  cannot  operate  as  originally  proposed.  The  problem  is  that 
the  uncertainty  in  the  image  point  locations  causes  the  range  of  values  consistent 
with  a  given  quadruple  of  model  points  to  depend  on  the  specific  locations  of  the 
image  points.  The  geometric  hashing  method  proposes  to  build  a  fast  lookup  table 
based  just  on  the  model,  and  thus  cannot  account  for  the  uncertainty  using  this  ta¬ 
ble.  We  show  how  geometric  hashing  can  be  modified  so  that  error  can  be  precisely 
accounted  for  at  run  time,  although  this  substantially  changes  the  method. 

1.1  Affine  Transformations  and  Invariant  Representations 

An  affine  transformation  of  the  plane  can  be  represented  as  a  nonsingular  2x2 
matrix  L,  and  a  2-vector,  t,  such  that  a  given  point  x  is  transformed  to  x'  = 
Lx  -t-  t.  It  is  well  known  that  such  a  transformation  maps  any  triple  of  points 
to  any  other  triple  (expect  in  degenerate  cases),  and  that  three  points  define  an 
afline  coordinate  frame  (analogous  to  a  Cartesian  coordinate  frame  in  the  case  of 
Euclidean  transformations)  [6,  14].  In  particular,  a  set  of  three  points  mi,  mj,  and 
m3  defines  an  affine  coordinate  frame  in  terms  of  which  any  other  point  x  can  be 
expressed  using 

X  =  mi  +  a(m2  -  mi) -I- /3(m3  -  mi).  (1) 

The  values  a  and  /?  remain  unchanged  when  a  given  affine  transforms  lion  A  is 
applied  to  x,  mi,  m2,  and  m3.  That  is, 

A(x)  =  i4(mi)  -I-  a(A(m2)  -  i4(mi))  -I-  /?(A(m3)  -  /4(mi)), 

where  A  is  any  affine  transformation.  Thus  the  pair  (or,/3)  can  be  referred  to  as  the 
affine-invariant  coordinates  of  the  point  x  with  respect  to  the  coordinate  frame,  or 
basis,  (mi,  m2, m3).  We  can  think  of  (a,/?)  as  a  point  in  a  two-dimensional  space 
that  we  term  the  Q-/?-plane. 
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The  computation  of  an  affine-invariant  representation  in  terms  of  a  coordinate 
frame  (mi,  m2,  m3)  has  been  used  explicitly  in  the  alignment  [12,  13]  and  geometric 
hashing  [16,  17,  18,  19,  22]  methods.  Both  methods  are  motivated  by  the  idea  of 
finding  sets  of  points  in  the  image  that  are  related  to  a  corresponding  set  of  points 
in  the  model  by  an  affine  transformation.  The  major  difference  between  the  two 
methods  is  in  terms  of  whether  the  computations  are  done  in  a  Euclidean  space 
(i.e.,  the  Cartesian  coordinate  systems  of  the  model  and  the  image)  or  the  affine- 
invariant  space  of  the  (a,/3)  values.  The  alignment  approach  operates  in  the  former 
domain,  whereas  the  geometric  hashing  approach  operates  in  the  latter  one. 

We  examine  the  effect  of  sensory  uncertainty  both  in  the  Euclidean  plane  and 
the  affine  a-/3-space.  In  particular,  we  model  each  sensor  point  in  terms  of  a  disc  of 
possible  locations.  The  size  of  this  disc  is  bounded  by  some  given  uncertainty  factor, 
c.  We  then  consider  the  range  of  values  for  a  fourth  point  written  in  terms  of  the 
basis  defined  by  the  other  three  points,  where  all  points  have  bounded  uncertainty. 
We  find  that  under  this  error  model,  in  the  Euclidean  space  the  set  of  possible 
values  for  a  given  point  x  in  terms  of  a  basis  (31,82,83)  forms  a  disc  whose  radius 
depends  on  e,  a,  and  0.  That  is,  assuming  that  each  image  point  has  a  sensing 
uncertainty  of  magnitude  c,  the  range  of  image  locations  that  are  consistent  with  x 
forms  a  circular  region. 

In  the  Q-/3-space,  the  set  of  possible  values  of  the  affine  coordinates  of  a  point  x 
in  terms  of  a  basis  (si,  82,83)  forms  an  ellipse  (except  in  degenerate  cases).  The  area, 
center  and  orientation  of  this  ellipse  are  given  by  somewhat  complicated  expressions 
that  depend  on  the  actual  configuration  of  the  basis  points.  The  most  important 
consequence  of  this  analysis  is  that  the  set  of  possible  values  in  the  a-/?-plane  cannot 
be  computed  independent  of  the  actual  locations  of  the  basis  points  Si,  82,  83,  in 
the  sensor  coordinate  system.  In  other  words  there  is  an  interaction  between  the 
uncertainty  in  the  sensor  values  and  the  actual  locations  of  the  sensor  points.  This 
limits  the  applicability  of  the  geometric  hashing  method,  as  it  requires  that  the  a-0 
coordinates  be  computable  independent  of  the  actual  location  of  the  basis  points 
(in  order  to  construct  a  hash  table  offline). 

Having  derived  expressions  for  the  range  of  locations  consistent  with  a  given 
point  X  and  a  pair  of  bases  (mi, m2, m3)  and  (81,82,83),  we  then  use  these  expres¬ 
sions  to  analyze  the  sensitivity  of  the  alignment  and  geometric  hashing  methods 
to  the  presence  of  sensor  noise.  We  develop  equations  giving  the  probability  that 
these  methods  will  falsely  report  a  match  when  none  is  present,  using  techniques 
similar  to  those  developed  in  [9,  10].  For  the  geometric  hashing  method,  our  anal¬ 
ysis  assumes  that  the  true  elliptical  regions  in  the  a-/?-plane  are  being  computed  - 
even  though  the  actual  implementations  of  the  geometric  hashing  method  do  not 
compute  these  values.  Thus  the  real  implementations  will  suffer  even  more  from  the 
problem  of  false  matches  (or  alternatively  will  have  the  problem  of  missing  correct 
matches). 
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TRANSFORM:  MODEL  BASIS 


Figure  1:  A  schematization  of  the  relation  between  the  image  coordinate  frame,  the 
model  coordinate  frame  and  the  affine-invariant  a-/3-space. 

2  Image  Uncertainty  and  Afiine  Coordinates 

The  main  issue  we  wish  to  explore  is  the  following:  Given  a  model  basis  of  three 
points  and  some  additional  model  point,  what  sets  of  four  image  features  are  pos¬ 
sible  transformed  instances  of  these  points?  In  other  words,  for  what  quadruples 
of  image  points  is  there  an  affine  transformation  defined  by  pairing  three  of  the 
image  points  with  the  three  model  basis  points,  such  that  the  fourth  model  point 
is  transformed  into  agreement  with  the  fourth  image  point.  Figure  1  schematizes 
the  situation.  A  set  of  model  points  are  given  in  a  Cartesian  coordinate  frame,  and 
some  distinguished  basis  triple  is  also  specified.  Similarly  a  set  of  image  points  are 
given  in  their  coordinate  frame.  Two  different  methods  are  used  to  map  between  the 
model  and  the  image.  One  method,  employed  by  geometric  hashing,  is  to  map  both 
the  model  and  the  image  points  to  (a,/9)  values  using  the  basis  triples.  The  other 
method,  used  by  alignment,  is  to  compute  the  transformation  mapping  the  model 
basis  to  the  image  basis,  and  then  use  this  transformation  to  map  the  model  points 
to  image  coordinates.  In  both  cases,  a  distinguished  set  of  three  model  and  image 
points  is  used  to  map  a  fourth  point  (or  many  such  points)  into  some  other  space. 
We  consider  the  effects  that  uncertainty  has  on  these  two  methods,  by  modeling 
each  image  point  as  an  c-sized  disc  of  possible  locations  rather  than  as  a  specific 
point. 

First  we  characterize  the  range  of  image  measurements  in  the  x-y  (Euclidean) 
plane  that  are  consistent  with  the  (a,/d)  pair  computed  for  a  given  quadruple  of 
model  points,  as  specified  by  equation  (1).  This  corresponds  to  the  case  of  explicitly 
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computing  a  transformation  from  one  Cartesian  coordinate  frame  (the  model)  to 
another  (the  image).  We  find  that  if  the  uncertainty  in  the  locations  of  the  sensor 
points  is  bounded  by  a  disc  of  radius  e,  then  the  range  of  possible  image  measures 
consistent  with  a  given  (a,/3)  pair  is  a  disc  with  radius  bounded  below  by  f(l  + 
|q|  +  |/3|)  and  above  by  2f(l  +  ja]  +  |/3|).  This  defines  the  set  of  image  points  that 
could  match  a  specific  model  point,  given  both  an  image  and  model  basis. 

We  then  perform  the  same  analysis  for  the  range  of  affine  coordinate,  or  a  and 
/3,  values  that  are  consistent  with  a  given  quadruple  of  points.  This  corresponds  to 
the  case  of  mapping  both  the  model  and  image  points  to  (a,/3)  values.  In  order  to 
do  this,  we  use  the  expressions  that  we  derived  for  the  Euclidean  case  to  determine 
the  region  of  a-/3-space  that  is  consistent  with  a  given  point  and  basis.  This  region 
of  a-^-space  is  in  general  an  ellipse  containing  the  point  (a,/3)  (but  not  necessarily 
centered  at  that  point).  The  expressions  for  the  size,  orientation  and  position  of 
the  ellipse  depend  on  the  actual  locations  of  the  points  defining  the  basis. 

Assume  that  we  are  given  three  model  points,  mi,  m2,  m3,  and  the  affine  coor¬ 
dinates  (a,/J)  of  a  fourth  model  point  x  defined  by 


X  =  mi -t- a(m2  -  mi) -|-y3(m3  -  mi).  (2) 

Further  assume  that  we  are  given  three  sensor  points  81,82,83,  such  that 

8,  =  T(m,  )  -I-  e,  , 

where  T  is  some  affine  transformation,  and  e,  is  an  arbitrary  vector  of  magnitude  at 
most  €{.  That  is,  T  is  some  underlying  affine  transformation  that  cannot  be  directly 
observed  in  the  data  because  each  data  point  has  been  perturbed  by  some  arbitrary 
vector  e,.  These  error  vectors  are  assumed  to  be  bounded  by  e,  ,  using  our  error 
model  that  represents  a  point  as  a  disc  of  radius  e,.  (Note  that  in  general  we  will 
always  use  =  e,  but  in  principle  one  could  allow  different  amounts  of  bounded 
uncertainty  with  different  features.) 

We  are  interested  in  the  possible  locations  of  a  fourth  sensor  point,  call  it  x, 
such  that  X  could  correspond  to  the  ideally  transformed  point  T(x).  We  note  that 
the  possible  positions  of  x  are  affected  both  by  the  sensor  error  in  measuring  each 
image  basis  point,  s,,  and  by  the  error  in  measuring  the  fourth  point  itself.  Thus  the 
possible  locations  are  given  by  transforming  the  invariant  representation  of  equation 
(2)  and  adding  in  the  error  eo  from  measuring  x, 

X  =  r(mi -I- o(m2  -  mi) -I- /?(m3  -  mi)) -I- eo 

=  Si  -  ei  -h  a(s2  -  €2  -  Si  -Hei)  -1-/3(s3  -  63  -  si  +  ei)  -f  eo 
=  Si  -I-  q(s2  -  Si)  -t-  ^(s3  -  si)  -ei  -I-  a(ei  -  e2)  +  /?(ei  -  eo)  -t-eo- 

That  is,  the  measured  point  x  can  lie  in  a  range  of  locations  about  the  ideal 
location,  specified  by 

Si  -|-a(s2  -si)-f-/3(s3  -si). 
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This  range  of  possible  locations  is  specified  by  the  linear  combination  of  the  four 
error  vectors 

-ei  +  Q(ei  -  62)  +  /3(ei  -  03)  +  eo, 

or  equivalently 

-  [(1  -  o  - /3)ei +002 +  /3e3 -eo],  (3) 

where  each  0,  is  an  arbitrary  vector  of  length  at  most  c,. 

The  set  of  all  possible  locations  specified  by  a  given  e,  is  a  disc  of  radius  e,  about 
the  origin,  which  we  denote  C'(ei): 

C{ei)  =  {Bi  1  ||0i||<c.}. 

Similarly,  the  product  of  any  constant  k  with  0,  yields  a  disc  C(ikf,)  of  radius  \k\(i 
centered  about  the  origin.  Thus  substituting  the  expressions  for  the  disc  in  equation 
(3)  we  obtain  the  following  expression  for  the  set  of  all  locations  about  the  ideal 
point  Si  +  a(s2  -  Si)  +  /3(s3  -  si); 

C([l  -  Q  -  /3]€, )  ©  C(a€2)  ©  C(/3c3)  e  C(eo),  (4) 

where  ©  denotes  the  Minkowski  sum  of  sets.  That  is,  given  two  sets  A  and  B, 
A  ®  B  =  {p  +  q\p  £  A,  q  €  B}  (and  similarly  for  6). 

In  order  to  simplify  the  expression  for  the  range  of  x  we  make  use  of  the  following 
fact,  which  follows  directly  from  the  definition  of  the  Minkowski  sum  for  sets. 

Claim  1  C(ri)  ©  C(r2)  =  C'(ri)  0  C(r2)  =  C(ri  +  r2),  where  C(ri)  is  a  disc  of 
radius  rj  centered  about  the  origin,  ri  >  0. 

If  we  assume  that  the  c,  are  all  equal  to  e  (i.e.,  all  the  sensor  error  bounds  are 
the  same),  then  using  Claim  1  we  can  simplify  equation  (4)  to 

C(e[|l-Q-/?|  +  |a|  +  |/3|  +  l]). 

The  absolute  values  arise  from  the  fact  that  a  and  0  can  become  negative,  but  the 
radius  of  a  disc  is  a  positive  quantity.  Clearly  the  radius  of  the  error  disc  grows  with 
increasing  magnitude  of  a  and  /?,  but  the  actual  expression  governing  this  growth  is 
different  for  different  portions  of  the  a  — /?-plane,  as  shown  in  Figure  2  (the  diagonal 
line  in  the  figure  is  1  -  a  -  /?  =  0).  In  particular,  the  absolute  values  will  lead  to 
different  expressions  for  the  radius  of  the  error  disc  as  a  function  of  q  and  0,  as 
illustrated  in  the  figure. 

We  can  bound  the  expressions  defining  the  radius  of  the  uncertainty  disc  by 
noting  that 

1  +  |al  +  \0\  <{\l-a-0\  +  \a\  +  |^|  +  1)  <  2(1  +  |q|  +  \0\). 

We  have  thus  established  the  following  result: 
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Figure  2: 

Diagram  of  error  effects.  The  region  of  feasible  points  is  a  disc,  whose  radius  is  given  by  the 
indicated  expression,  depending  or.  the  values  of  a  and  0. 
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Figure  3: 

Diagram  of  error  effects.  A  set  of  four  model  points  are  shown  on  the  left.  The  positions  of  four 
image  points  are  shown  on  the  right,  three  of  which  are  used  to  establish  a  basis.  The  actual 
position  of  each  transformed  model  point  corresponding  to  the  basis  image  points  is  offset  by  an 
error  vector  of  bounded  magnitude.  The  coordinates  of  the  fourth  point,  written  in  terms  of  the 
basis  vectors,  can  thus  vary  from  the  ideal  case,  shown  in  solid  lines,  to  cases  such  as  that  shown 
in  dashed  lines.  This  leads  to  a  disc  of  variable  size  in  which  the  corresponding  fourth  model  point 
could  lie. 

Proposition  1  The  range  of  image  locations  that  is  consistent  with  a  given  pair  of 
affine  coordinates  (a,/9)  is  a  disc  of  radius  r,  where 

€(l  +  |Q|  +  |/?i)<r<2f(l  +  |a|  +  |^|) 

and  where  e  is  a  positive  constant  that  bounds  the  positional  uncertainty  of  the  image 
data. 

The  effect  of  this  circular  uncertainty  region  for  the  location  of  x  is  illustrated  in 
Figure  3.  The  positional  uncertainty  in  the  locations  of  the  three  image  basis  points 
results  in  a  circle  of  possible  locations  for  the  fourth  point.  The  error  in  measuring 
the  fourth  point  itself  increases  the  radius  of  this  error  disc. 

The  expression  in  Proposition  1  allows  the  calculation  of  error  bounds  for  any 
method  based  on  two-dimensional  affine  transformations,  such  as  [2,  12,  21].  Tu 
particular,  if  [aj  and  |/3|  are  both  less  than  1,  then  the  error  in  the  position  of  a 
point  is  at  most  6e.  This  condition  can  be  met  by  using  as  the  affine  basis,  three 
points  mi, m2  and  m3  that  lie  on  the  convex  hull  of  the  set  of  model  points,  and 
are  maximally  separated  from  one  another. 

It  should  be  noted  that  the  expression  in  Proposition  1  is  independent  of  the 
actual  locations  of  the  model  or  image  points.  This  means  that  the  possible  positions 
of  the  fourth  point  vary  only  with  the  sensor  error  and  the  values  of  a  and  0.  They 
do  not  vary  with  the  configuration  of  the  model  basis  (e.g.,  even  if  close  to  collinear) 
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nor  do  they  vary  with  the  configuration  of  the  image  basis.  In  other  words,  the  error 
range  does  not  depend  on  the  viewing  direction.  Even  if  the  model  is  viewed  end  on, 
so  that  all  three  model  points  appear  nearly  co-linear,  or  if  the  model  is  viewed  at  a 
small  scale,  so  that  all  three  model  points  are  close  together,  the  size  of  the  region 
of  possible  locations  of  the  fourth  model  point  in  the  image  will  remain  unchanged. 

The  viewing  direction  does,  however,  greatly  affect  the  affine  coordinate  system 
defined  by  the  three  projected  model  points.  Thus  the  set  of  possible  affine  coordi¬ 
nates  of  the  fourth  point,  when  considered  directly  in  a-/?-space,  will  vary  greatly. 
Our  next  goal  is  to  characterize  this  set  of  affine  coordinates.  This  can  be  done 
by  making  use  of  Proposition  1,  which  tells  us  the  set  of  image  locations  consistent 
with  a  fourth  point.  Implicit  in  this  analysis  is  the  set  of  affine  transformations 
that  produce  possible  fourth  image  point  locations.  This  can  in  turn  be  used  to 
characterize  the  range  of  (a,/!)  values  that  are  consistent  wil.i  a  given  set  of  four 
points. 

We  will  do  the  analysis  using  the  upper  bound  on  the  radius  of  the  error  disc  from 
Proposition  1.  In  actuality,  the  analysis  is  slightly  more  complicated,  because  the 
expression  governing  the  disc  radius  varies  as  shown  in  Figure  2.  For  our  purposes, 
however,  considering  the  extreme  case  is  sufficient.  It  should  also  be  noted  from  the 
figure  that  the  extreme  case  is  in  fact  quite  close  to  the  actual  value  over  much  of 
the  range  of  a  and  f3. 

Given  a  triple  of  image  points  that  form  a  basis,  and  a  fourth  image  point,  S4, 
we  are  interested  in  determining  the  range  of  affine  coordinates  for  the  fourth  point 
that  are  consistent  with  the  possibly  erroneous  image  measurements.  In  effect, 
each  sensor  point  takes  on  a  range  of  possible  values,  and  each  quadruple  of 
such  values  produces  a  possibly  distinct  value  using  equation  (1).  As  illustrated 
in  Figure  4  we  could  determine  all  the  feasible  values  by  varying  the  basis  vectors 
over  the  uncertainty  discs  associated  with  their  endpoints,  finding  the  set  of  (a',/?') 
values  such  that  the  resulting  point  in  this  affine  basis  lies  within  c  of  the  original 
point.  By  our  previous  results,  however,  it  is  equivalent  to  find  affine  coordinates 
(a',/3')  such  the  Euclidean  distance  from 

Si  +  a'{s2  —  Si)  +  —  Si) 


to 

Si  +  q(S2  -  Si)  +  /J(S3  -  Si) 

is  bounded  ahnve  by  2f(l  +  ja']  +  1/3'|). 

The  boundary  of  the  region  of  such  points  {a',0')  is  defined  by  requiring  the 
distance  from  the  nominal  image  point 


S4  =  Si  +o(S2  -  Si)  +  /3(S3  -  Si) 
to  be  2€(1  +  |(  'I  +  l/l'l),  which  is  when 

[2f(l  +  |a'|  +  |/3'|)]^  =  [(a  -  a)uf  +  2(/3  -  !}'){a  -  Q')?;ii  cos  d*  +  [(/3  -  /?')i’]^  (5) 
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Figure  4: 

The  example  on  the  left  shows  the  cajionical  example  of  affine  coordinates.  The  fourth  point  is 
offset  from  the  origin  point  by  the  sum  of  a  times  the  first  basis  vector  u  plus  0  times  the  second 
basis  vector  v.  The  example  on  the  right  shows  a  second  consistent  set  of  affine  coordinates.  By 
taking  other  vectors  that  lie  within  the  uncertainty  regions  of  each  of  the  image  points,  we  can  find 
a  different  set  of  affine  coordinates  a' ,  0'  such  that  the  new  fourth  point  based  on  these  coordinates 
also  lies  within  the  uncertainty  bound  of  the  image  point. 
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where 


u  =  S2  -  Si 

V  =  S3  -  Si 
u  =  ||u|| 

V  =  llvll 

and  where  the  angle  made  by  the  image  basis  vectors  S2  —  Si  and  S3  -  Si  is  (^. 
Considered  as  an  implicit  function  of  equation  (5)  defines  a  conic.  If  we 

expand  out  equation  (5),  we  get 

+  20120*^^  +  a22(/J^)^  +  20130'  +  2023/9^  +  033  =  0  (6) 


where 


Oil  =  V?  —  4t^ 

022  =  —  4e^ 

O12  =  VU  cos  <f>  —  4SaS0€^ 

Oi3  =  -U  [oo  +  Pv  cos  </)]  -  4So€^ 

023  =  -u[qucos<^  +  l3v]  -  4s/je^ 

033  =  +  2af3uvcos<p  +  -  46^ 


and  where 

r  1  if  o'  >  0, 

\  -1  if  a'  <  0, 
r  1  if  /?'  >  0, 

1-1  if/?'<0. 

For  notationaJ  simplicity  in  what  follows,  it  is  convenient  to  assume  that  o  and 
13  are  positive,  so  that  Sa  =  1,5/7  =  1- 

We  can  use  this  form  to  compute  the  invariant  characteristics  of  a  conic  [15]: 


— 

5/3  = 


I  —  -\-v^  -  Se^  (7) 

D  ~  sin^  <i>  -  4c^  («^  -  2uvSaSp  cos  <i>  +  v^)  (8) 

A  = -4e^u^u^s;n^  </►(!  + SaO  +  s/3/?)^  (9) 

If  >  Sc^,  then  j  <  0.  Furthermore,  if 

sin^  4>  >  4c^  —  2uvSaSp  cos  (j>  + 

then  D  >  0  and  the  conic  defined  by  equation  (5)  is  an  ellipse.  We  will  ignore  the 
degenerate  cases  in  which  the  conic  is  not  an  ellipse.  Such  cases  only  occur  either 
when  the  image  basis  points  are  very  close  together,  or  when  the  image  basis  points 
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are  nearly  collinear.  For  instance,  as  long  as  the  image  basis  vectors  u  and  v  are 
each  at  least  2c  in  length  then  >  8c^.  Similarly,  as  long  as  sin  (j>  is  not  small, 

D  >  0.  In  fact,  cases  where  these  conditions  do  not  hold  will  be  very  unstable  and 
thus  should  be  avoided  anyway. 

Given  the  conic  invariants,  we  can  compute  a  number  of  characteristics  of  the 
ellipse.  The  area  of  the  ellipse  is  given  by 

47rc^u^u^sin*^(l  +  SaO  + 

i  •  V 

[u^v'^  sin*  4>  —  4e^  {v?  —  2uvSaSp  cosd>  + 


The  center  of  the  ellipse  is  at 

ao  =  —  sin^  <f>  —  4e^{au^  —  Sa(l  +  S0f3)v^  +  uvcosd>(/?  + 

/3o  —  sin^  4>  —  4c^(/3u^  -  Sja(l  +  Saa)u^  +  uvcos  <f>{a  +  Sc,(l 

The  angle  of  the  principal  axes,  $,  with  respect  to  the  a  axis  is 

_  2[uVCOS<^  —  4€^Sc,5/?] 

tdjl  ~  n  * 

Thus  we  have  established  the  following; 


^^^)))]  • 
(11) 


(12) 


Proposition  2  Given  bounded  errors  of  e  in  the  measurement  of  the  image  points, 
the  region  of  uncertainty  associated  with  a  pair  of  affine  coordinates  {oi,/3)  in  a-f3- 
space  is  an  ellipse.  The  area  of  this  ellipse  is  given  by  equation  (10),  the  center  is 
at  (ao,/3o)  ns  given  by  equation  (11),  and  the  orientation  is  given  by  equation  (12). 


In  other  words,  given  a  set  of  four  points  whose  locations  are  only  known  to 
within  discs  of  radius  c,  there  is  an  ellipse-shaped  region  of  possible  (o:,/?)  values 
specifying  the  location  of  one  point  with  respect  to  the  other  three.  Thus  if  we 
compare  {ol,0)  values  generated  by  some  model  of  an  object  with  those  specified  by 
an  image,  when  there  is  c-uncertainty  in  the  image  data,  each  image  datum  actually 
specifies  an  ellipse  of  (a,/?)  values.  The  area  of  this  ellipse  depends  on  the  degree  of 
sensor  uncertainty,  c,  the  values  of  a  and  /?,  and  the  configuration  of  the  three  image 
points  that  form  the  basis.  In  order  to  compare  the  model  values  with  image  values 
it  is  necessary  to  check  that  the  affine-invariant  coordinates  for  each  model  point 
lie  within  the  elliptical  region  of  possible  affine-invariant  values  associated  with  the 
corresponding  image  point. 

The  fact  that  the  regions  of  consistent  parameters  in  a-/?-space  are  ellipses  causes 
some  difficulties  for  discrete  hashing  schemes,  such  as  the  one  employed  by  geometric 
hashing.  This  is  discussed  in  greater  detail  in  a  later  section,  but  the  basic  idea  of 
the  geometric  hashing  method  is  to  compute  affine  coordinates  of  model  points  with 
respect  to  some  choice  of  basis,  and  to  use  these  affine  coordinates  as  the  hash  keys 
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to  store  the  basis  in  a  table.  In  general,  the  implementations  of  this  method  use 
square  buckets  to  tessellate  the  hash  space  (the  a-/3-space).  In  this  case,  we  see  that 
even  if  we  chose  buckets  whose  size  is  commensurate  with  the  ellipse,  several  such 
buckets  are  likely  to  intersect  any  given  ellipse  due  to  the  difference  in  shape  of  the 
two  regions.  Thus,  it  is  necessary  to  hash  to  multiple  buckets,  and  this  increases  the 
probability  that  a  random  pairing  of  model  and  image  bases  will  receive  a  significant 
number  of  votes. 

A  further  problem  for  discrete  hashing  schemes  is  the  fact  that  the  size  of  the 
ellipse  increases  as  a  function  of  (1  +  |o|  +  |/J|)^.  Thus  points  with  larger  affine  coor¬ 
dinates  give  rise  to  larger  ellipses  than  those  with  smaller  coordinates.  The  contours 
along  which  the  centers  of  equal-sized  ellipses  lie  are  parabolic  arcs  (i.e.  contours  of 
constant  1  -)-  ja]  -f  |/?|),  rather  than  circles.  Either  one  must  hash  a  given  value  to 
many  buckets,  or  one  must  account  for  this  effect  by  sampling  the  space  in  a  manner 
that  varies  with  parabolic  distance,  but  this  would  require  some  careful  analysis. 

The  most  critical  issue  for  discrete  hashing  schemes  is  the  fact  that  the  shape, 
orientation  and  position  of  the  ellipse  depends  on  the  specific  image  basis  chosen. 
That  is,  the  orientation  of  the  ellipse  changes  as  u,v  and  4>  change  (which  are 
parameters  computed  from  the  image  basis).  This  means  that  there  is  no  clear 
way  to  fill  the  hash  table  as  a  pre-processing  step,  independent  of  a  given  image, 
which  is  a  crucial  part  of  the  geometric  hashing  method.  The  problem  is  that  the 
error  ellipse  associated  with  a  given  (a,/?)  pair  depends  on  the  characteristics  of  the 
image  basis,  and  we  don’t  know  that  ahead  of  time.  There  is  no  way  to  pre-compute 
these  error  regions  because  they  depend  inherently  on  the  image  point  configuration. 
This  means  it  is  either  necessary  to  approximate  the  ellipses  by  assuming  bounds 
on  the  possible  image  basis,  which  will  allow  both  false  positive  and  false  negative 
hits  in  the  hash  table,  or  to  compute  the  ellipse  to  access  at  run  time.  Note  that 
the  geometric  hashing  method  does  not  address  any  of  these  issues.  It  is  simply 
assumed  that  some  ‘appropriate’  tessellation  of  the  image  space  exists. 

In  summary,  in  this  section  we  have  characterized  the  range  of  image  coordinates 
and  the  range  of  (a,/9)  values  that  are  consistent  with  a  given  point,  with  respect 
to  some  basis,  when  there  is  uncertainty  in  the  image  data.  In  the  following  section 
we  analyze  what  fraction  of  all  possible  points  (in  some  bounded  image  region)  are 
consistent  with  a  given  range  of  (a,/?)  values.  Then  in  the  subsequent  sections 
we  use  this  to  derive  expressions  for  the  probability  of  a  false  match  for  both  the 
geometric  hashing  method  and  the  alignment  method. 

3  The  Selectivity  of  Affine-Invariant  Representations 

We  are  interested  in  determining  the  probability  than  an  object  recognition  system 
will  erroneously  report  an  instance  of  an  object  in  an  image.  Recall  that  such  an 
instance  in  general  is  specified  by  giving  a  transformation  from  model  coordinates 
to  image  coordinates,  and  a  measure  of  ‘quality’  based  on  the  number  of  model 
features  that  are  paired  with  image  features  under  this  transformation.  Thus  we 
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are  interested  in  whether  a  random  association  of  model  and  image  features  can 
occur  in  sufficient  number  to  masquerade  as  a  correct  solution.  We  use  the  results 
developed  above  in  order  to  determine  the  probability  of  such  a  false  match.  There 
are  two  stages  to  this  analysis;  the  first  is  a  statistical  analysis  that  is  independent 
of  the  given  recognition  method,  and  the  second  is  a  combinatorial  analysis  that 
depends  on  the  particular  recognition  method.  In  this  section  we  examine  the  first 
stage,  and  then  in  subsequent  sections  we  turn  to  the  analysis  of  the  geometric 
hashing  and  alignment  methods. 

In  order  to  determine  the  probability  that  a  match  will  be  falsely  reported  we 
need  to  know  the  ‘selectivity’  of  a  quadruple  of  model  points.  Recall  from  Figure  1 
that  each  model  point  is  mapped  to  a  point  Of-yS-space  with  respect  to  a  particular 
model  basis  (triple).  Similarly  each  image  point,  modeled  as  a  disc,  is  mapped  to 
an  elliptical  region  of  possible  points  in  o-jfl-space.  Each  such  image  region  that 
contains  one  or  more  model  points  specifies  an  image  point  that  is  consistent  with 
the  given  model.  Thus  we  need  to  estimate  the  probability  that  a  given  image  basis 
and  fourth  image  point  chosen  at  random  will  map  to  a  region  of  Q-/3-space  that 
is  consistent  with  one  of  the  model  points  written  in  terms  of  some  model  basis. 
One  way  of  characterizing  this  is  in  terms  of  the  proportion  of  the  a-/3-space  that 
is  consistent  with  a  given  basis  and  fourth  point  (where  the  size  of  the  space  is 
bounded  in  some  way).  As  was  shown  above,  the  elliptical  regions  in  a-/3-space  are 
equivalent  to  circular  regions  in  image  space.  Thus,  for  ease  of  analysis  we  choose 
to  work  with  the  formulation  in  terms  of  circles  in  image  space. 

To  determine  the  selectivity,  we  assume  we  are  given  some  image  basis  and  a 
potential  corresponding  model  basis.  Each  of  the  remaining  m  -  3  model  points 
are  defined  as  affine  coordinates  relative  to  the  model  basis.  These  can  then  be 
transformed  into  the  image  domain,  by  using  the  same  affine  coordinates,  with 
respect  to  the  image  basis.  Because  of  the  uncertainty  of  the  image  points,  there 
is  an  uncertainty  in  the  associated  affine  transformation.  This  manifests  itself  as 
a  range  of  possible  positions  for  the  model  points,  as  they  are  transformed  into 
the  image.  Previously  we  determined  that  a  transformed  model  point  had  to  be 
within  2€(1  +  |a|  +  |/3|)  of  an  image  point  in  order  to  match  it.  That  calculation 
took  into  account  error  in  the  matched  image  point  as  well  as  the  basis  image 
points.  Therefore,  placing  an  appropriately  sized  disc  about  each  model  point  is 
equivalent  to  placing  an  e  sized  disc  about  each  image  point.  We  thus  represent  each 
transformed  model  point  as  giving  rise  to  a  disc  of  some  radius,  positioned  relative 
to  the  nominal  position  of  the  model  point  with  respect  to  the  image  basis.  For 
convenience,  we  use  the  upper  bound  on  the  size  of  the  radius,  2e(H-  |q|  + 1/3|).  For 
each  model  point,  rewritten  in  image  coordinates,  we  need  to  know  the  probability 
that  at  least  one  image  point  lies  in  the  associated  error  disc  about  the  transformed 
model  point,  because  if  this  happens  it  means  that  there  is  a  consistent  model  and 
image  point  for  the  given  model  and  image  basis.  To  estimate  this  probability, 
we  need  to  estimate  the  expected  size  of  the  disc.  Since  the  disc  size  varies  with 
|a|  +  1/3|,  this  means  we  need  an  estimate  of  the  distribution  of  points  with  respect 
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Figure  5: 

Histogram  of  distribution  of  |a|  +  \0\  values.  Vertical  axis  is  ratio  of  number  of  samples  to  total 
samples,  horizontal  axis  is  value  for  |a|  +  \fi\-  The  maximum  over  300,000  samples  was  51.  Only 
the  first  portion  of  the  graph  is  displayed. 

to  afHne  coordinates.  In  fact,  by  Figure  2  we  should  find  the  distribution  of  points 
as  a  function  of  (a,/3)  since  the  disc  sizes  varies  with  these  values.  This  is  messy, 
and  thus  we  use  an  approximation  instead. 

For  this  approximation,  we  measure  the  distribution  with  respect  to  p,  where 
p  =  laj  +  |/3|,  since  both  the  upper  and  lower  bounds  on  the  disc  size  are  functions 
of  this  variable.  Intuitively  we  expect  the  distribution  to  vary  inversely  with  p.  To 
verify  this,  we  ran  the  following  experiment.  A  set  of  25  points  were  generated  at 
random,  with  the  property  that  their  pairwise  minimum  separation  was  at  least  25 
pixels,  and  their  pairwise  maximum  separation  was  at  most  250  pixels.  All  possible 
bases  were  selected,  and  for  each  basis  for  which  the  angle  between  the  axes  was 
at  least  7r/16,  all  the  other  model  points  were  rewritten  in  terms  of  affine  invariant 
coordinates  (a,/3).  This  gave  roughly  300,000  samples,  which  we  histogrammed 
with  respect  to  p(a,/3)  =  |a|  +  \^\.  We  found  that  the  maximum  vcdue  for  p  in 
this  case  was  roughly  51.  In  general,  however,  almost  all  of  the  values  were  much 
smaller,  and  indeed,  the  distribution  showed  a  strong  inverse  drop  off,  as  can  be 
seen  from  Figure  (5). 

Given  this  evidence,  we  considered  two  different  models  for  the  distribution  of 
points  in  affine  coordinates.  The  first  is: 

(  kp  p  <  I 

=  P>1. 

Figure  (6)  illustrates  the  fit  of  this  to  the  actual  data.  The  second  is: 

( kp  p<l 

p>l.  (14) 

Figure  (7)  illustrates  the  fit  of  this  to  the  actual  data. 
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Figure  6: 

Histogram  of  distribution  of  lol  +  \P\  values.  Vertical  axis  is  ratio  of  number  of  samples  to  total 
samples,  horizontal  axis  is  value  for  |a|  4- 1/9|.  The  maximum  over  300,000  samples  was  51.  Only 
the  first  portion  of  the  graph  is  displayed.  Overlayed  with  this  is  a  p~^  distribution. 


Figure  7:  Histogram  of  distribution  of  |a|  +  |/3|  values.  Vertical  axis  is  number  of 
samples,  horizontal  axis  value  for  p  =  |a|  +  |/?|.  The  maximum  over  300,000  samples 
was  51.  Only  the  first  portion  of  the  graph  is  displayed.  Overlayed  with  this  is  a 
p'j  distribution. 
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We  choose  to  use  the  first  model,  because  it  underestimates  the  probability  for 
large  values  of  p,  at  the  cost  of  overestimating  it  for  small  values  of  p.  Since  we  are 
interested  in  finding  the  expected  size  of  the  error  disc,  and  this  grows  with  p  such 
an  approximation  will  underestimate  the  size  of  the  disc. 

First,  we  integrate  equation  (13)  over  all  possible  values  and  normalize  to  1  in 
order  to  deduce  the  constant: 

^  =  g  (15) 

Pm 

where  pm  is  the  maximum  value  for  p  (and  p  =  |q|  +  |/J|). 

Next,  we  want  to  find  the  expected  area  of  a  disc  in  image  space.  Recall  that 
we  are  going  to  examine  the  upper  bound  on  the  disc  size,  so  that  in  principle,  this 
area  is  just 

47rf^(l  +  p)^. 

We  could  simply  integrate  this  with  respect  to  the  distribution  from  equation  (13) 

fPm 

/  47rf^(l  +  p)^^(p)dp. 

Jp=0 

This,  however,  ignores  the  fact  that  the  image  is  of  finite  size  (say  each  dimension 
is  2r),  and  some  of  the  disc  may  lie  beyond  the  bounds  of  the  image.  We  therefore 
separate  out  four  different  cases. 

The  first  case  is  for  p  <  1.  Here  we  get 

Ai  =  f  Air€^{l  +  p)^kpdp  =  4ne^k^.  (16) 

Jp=0  1* 

The  second  case  considers  discs  that  will  lie  entirely  within  the  bounds  of  the  im¬ 
age.  Consider  figure  8,  which  shows  the  limiting  case,  assuming  that  the  coordinate 
fraune  of  the  basis  is  centered  at  the  center  of  the  image,  and  the  image  dimensions 
are  2r  by  2r.  In  this  case,  we  have 


r  —  p  >  7 


where  7  =  2€(l  -|-  p).  In  general,  we  have  p  <  pd  where  d  is  the  separation  between 
two  of  the  basis  points  in  the  image,  and  this  leads  to  the  condition  that  if  1  <  p  <  ci 
where 

then  the  discs  will  all  lie  entirely  within  the  image.  Thus  the  second  case  is 
A2  = 


/  4x€^{  l  +  p)^kp  ^  dp 

Jp-\ 


1 


=  47r€^k  Cl -t- 2  log  Cl - 


Cl 


■'Am 


—  d^  —  4f(r  d) 
(d  4- 2()(r  -  2() 


(If) 
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Figure  8: 

Case  2.  Limiting  case  of  an  error  disc  lying  entirely  within  the  image,  assuming  the  coordinate 
basis  is  center  in  the  image. 


Figure  9: 

Case  3.  Example  of  case  in  which  the  image  error  disc  does  not  lie  entirely  within  the  bounds  of 
the  image. 
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Figure  10: 

Case  3.  Underestimate  of  the  area  of  the  image  error  disc. 


The  final  expansion  is  based  on  the  assumption  that  p^n  >  which  is  true  for 
virtually  all  cases  of  interest. 

In  the  third  case,  when  pm  >  ct,  for  values  of  p  >  ci  there  is  some  truncation 
of  the  disc.  This  situation  is  shown  in  figure  9.  In  this  case  the  area  of  the  portion 
of  the  disc  lying  inside  the  image  is  given  by 


7 


2 


+  (r-p)  7^  -(r-pf 


(18) 


Integrating  this  with  respect  to  the  distribution  S(p)  is  messy.  Because  we 
are  interested  in  underestimating  the  expected  area  of  the  discs,  we  can  use  the 
following  approximation.  For  values  of  p  ranging  from  cj  to  C2,  where  C2  is  the 
value  for  which  pd  reaches  the  edge  of  the  image,  we  can  underestimate  the  area 
of  the  disc  contained  within  the  image,  by  using  the  faceted  approximation  shown 
in  Figure  10.  The  actual  expression  for  the  area  in  the  third  case,  A3,  is  relatively 
complex,  and  is  given  in  Appendix  A. 

The  final  case  occurs  when  the  actual  point  is  beyond  the  limits  of  the  image, 
but  the  disc  size  is  large  enough  that  some  portion  of  it  intersects  the  image.  The 
case  is  shown  in  Figure  11,  as  well  as  the  approximation  we  use  to  underestimate 
the  area.  Again  the  expression  is  complex,  and  is  given  in  Appendix  A. 

Depending  on  the  specific  values  for  Pm,  ct  and  C2  we  can  add  in  the  appropriate 
contributions  from  equations  16,  17,  32  and  34,  together  with  the  value  for  k  (from 
equation  (15))  to  obtain  an  underestimate  for  the  expected  area  of  an  error  disc 
—  the  expected  area  of  a  circle  in  image  space  that  will  be  consistent  with  a  point 
expressed  in  terms  of  some  affine  basis.  Since  such  discs  can  in  general  occur  with 
equal  probability  anywhere  in  the  image,  the  probability  that  a  model  point  lies 
within  a  disc  associated  with  an  image  point  is  simply  the  ratio  of  this  area  to  the 
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Figure  11: 

Case  4.  Underestimate  of  the  area  of  the  image  error  disc. 

area  of  the  image.  Thus  by  normalizing  these  equations,  by  dividing  by  (2r)^,  where 
r  is  half  the  diameter  of  the  image,  we  have  an  underestimate  for  the  selectivity  of 
the  scheme. 

This  leads  to  the  following  estimate  for  the  selectivity  of  the  scheme: 

Proposition  3  Given  a  model  basis  and  a  fourth  model  point,  the  probability  that 
an  image  basis  and  a  fourth  image  point,  hypothesized  to  correspond  to  the  model 
basis  and  point,  will  map  at  random  to  a  region  of  a-fi -space  consistent  with  the 
model  point  and  basis  is  given  by 

Ai  +  A2  +  A3  +  A4 

- ^2 -  (19) 

where  the  Ai ’s  are  given  by  equations  16,  17,  32,  and  34- 

This  is  based  on  using  the  upper  bound  on  the  radius  of  the  error  discs.  As 
noted  earlier,  a  simple  lower  bound  can  be  obtained  by  substituting  e/2  in  place  of 
f,  reflecting  the  use  of  the  bound  e(l  +  p)  in  place  of  2e(l  +  p).  In  this  case,  the 
bounds  Cl  and  cj  will  change  slightly. 

We  can  use  this  to  compute  example  values  for  the  selectivity,  which  depends 
on  Pm  (the  maximum  value  of  |q|  +  \0\).  If  we  allow  any  possible  triple  of  points 
to  form  a  basis,  then  pm  can  be  arbitrarily  large.  Consider  the  example  shown  in 
Figure  (12).  The  value  for  p  associated  with  the  point  p  is  given  by 

— (u|sin»|  +  n|sin((?!>-  0)1). 
uv\  sin  <p\ 

As  4)  approaches  0,  this  value  becomes  unbounded.  We  can  exclude  unstable  bases 
if  we  set  limits  on  the  allowable  range  of  values  for  (/>,  in  particular,  we  can  restrict 
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Figure  12: 

Diagram  of  affine  coordinates. 


Figure  13: 

Graph  of  selectivity  /x  for  c  =  3  as  the  basis  vector  length  d  varies. 


our  attention  to  bases  with  the  property  that 


<t>o  <  <f>  <  IT  —  4>o  or  7r+0o<(/><27r  —  4>o- 


By  applying  standard  minimization  methods,  one  finds  that  if  the  maximum  dis¬ 
tance  between  any  two  model  points  is  M  and  the  minimum  distance  is  m,  then 
the  maximum  value  for  p  is  given  by 


Pm  ^ 


M  1 
^  sin  ^ 


(20) 


To  evaluate  the  selectivity,  we  also  need  to  know  d,  the  length  of  the  basis 
vector,  which  can  vary  from  1  to  r.  Given  a  specific  value  for  d,  we  can  compute 
the  selectivity.  To  get  a  sense  of  the  variation  of  /x  as  d  changes,  p  is  plotted  as  a 
function  of  d  in  Figure  13,  for  c  =  3. 

In  general,  d  will  take  on  a  variety  of  values,  as  the  choice  of  basis  points  in  the 
image  is  varied.  To  get  an  estimate  for  the  expected  degree  of  selectivity,  we  perform 
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Case 

Measured 

Predicted 

Approximation 

e  =  1 

.000116 

.000117 

.000118 

II 

.001146 

.001052 

.001064 

€  =  5 

.003142 

.002911 

.002955 

Table  1: 

Table  comparing  simulated  and  predicted  selectivities.  In  all  cases,  the  ratio  of  minimum  to  maxi¬ 
mum  separation  of  points  was  10.  The  predicted  column  uses  the  full  expression  for  Tf  from  equation 
19,  while  the  approximation  column  uses  the  approximation  given  by  equation  23.  The  measured 
column  reports  actual  observed  selectivities  obtained  by  generating  sets  of  model  and  image  fea¬ 
tures  at  random,  and  counting  the  number  of  matches,  within  error,  for  a  pairing  of  an  image  and 
model  basis. 

the  following  analysis.  We  assume,  for  simplicity,  that  the  origin  of  the  image  basis 
is  at  the  center  of  the  image.  The  second  point  used  to  establish  the  basis  vector  can 
in  principle  lie  anywhere  in  the  image,  with  equal  probability.  Hence  the  probability 
distribution  for  d  is  roughly  (ignoring  corner  effects  in  the  image) 

r2- 

We  could  explicitly  integrate  equation  19  with  respect  to  this  distribution  for  d 
to  obtain  an  expected  selectivity.  This  is  messy,  and  instead  we  pursue  two  other 
options. 

First,  we  can  integrate  this  numerically  for  a  set  of  examples,  shown  in  Table  1 
under  the  column  marked  predicted,  which  lists  values  for  /i  as  a  function  of  noise 
in  the  image  (with  an  image  dimension  of  2r  =  500).  The  value  of  was  set 
using  <t>o  =  7r/16,  and  a  ratio  of  minimum  to  maximum  model  point  separation 
of  M/m  =  10.  It  should  be  noted  that  varying  </>o  over  the  rajige  tt/S  to  x/32 
produced  results  very  similar  to  those  reported  in  the  table.  As  one  would  expect, 
the  probability  of  a  consistent  match  increases  (selectivity  decreases)  with  increasing 
error  in  the  measurements.  Thus  we  can  see  that  for  ranges  of  parameters  that  one 
would  find  in  many  recognition  situations,  a  considerable  fraction  of  the  space  of 
possible  a  and  values  are  consistent  with  a  given  feature  and  basis. 

To  test  the  validity  of  our  formal  development,  we  ran  a  series  of  simulations  on 
randomly  chosen  features  to  test  the  selectivity  values  p,  predicted  by  equation  (19). 
We  generated  sets  of  model  and  image  features  at  random,  chose  bases  for  each  at 
random,  then  checked  empirically  the  probability  that  a  model  point,  rewritten  in 
the  image  basis,  lay  within  the  associated  error  disc  of  an  image  point.  We  chose 
to  consider  only  cases  in  which  the  error  disc  fits  entirely  within  the  bounds  of  the 
image,  since  we  know  that  our  predictions  are  underestimates  for  the  other  cases. 
Table  1  summarizes  the  results,  under  the  column  marked  measured. 

Second,  we  can  approximate  the  selectivity  expression.  By  applying  power  series 
expansions  for  the  different  terms  in  equations  16.  17,  32  and  34,  and  keeping  only 
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first  and  second  order  terms,  we  arrive  at 


fjL  ss 


r2 


17  r 

Y2+2log-  + 


—  d? 
rd 


(21) 


Finding  the  expected  value  for  equation  21  over  the  distribution  for  d,  where  d  can 
range  from  some  minimum  value  £  to  r,  in  turn  yields  the  following  approximation 
for  the  expected  selectivity: 


2A:7rf^ 


15  £^  £29  ,  r  r  £  \ 

T“;^(24+'°®7  +  £“3;j 


^  r2  -  ^2 

For  the  case  of  ^  <C  r,  this  reduces  to 


15  £  /£\\  r 


(22) 


(23) 


and  this  predicts  values  in  close  agreement  with  those  recorded  in  Table  1,  as  shown 
in  the  column  marked  approximation. 

Note  that  the  selectivity  is  clearly  not  linear  in  sensor  error.  For  a  fixed  size 
image,  increasing  the  error  c  by  some  amount  should  decrease  the  selectivity  (in¬ 
crease  the  probability)  by  at  least  a  quadratic  effect  (perhaps  more  since  there  are 
higher  order  terms).  This  is  reflected  in  Table  1,  where  increasing  e  from  1  to  3 
increases  the  predicted  probability  by  roughly  a  factor  of  9,  and  increasing  from  1 
to  5  increases  the  predicted  probability  by  roughly  a  factor  of  25.  This  expected 
value  of  the  selectivity  allows  us  to  analyze  the  probability  that  a  match  will  be 
reported  at  random  by  some  recognition  method  that  uses  affine  transformations. 
The  selectivity,  JI,  in  essence  reflects  the  power  of  a  given  quadruple  of  features  to 
distinguish  a  particular  model.  Now  we  consider  the  manner  in  which  information 
from  multiple  quadruples  is  combined.  This  analysis  differs  slightly  for  different 
recognition  methods.  First  we  examine  the  geometric  hcishing  method  and  then  the 
alignment  method. 


4  The  Geometric  Hashing  Method 

We  are  now  ready  to  investigate  the  probability  that  the  geometric  hashing  method 
will  randomly  report  a  match  of  a  model  to  an  image,  under  an  affine  transformation 
from  the  model  to  the  image  [4,  16,  17,  18,  19,  22].  The  geometric  hashing  method 
is  based  on  the  idea  of  representing  an  object  by  storing  redundant  transformation- 
invariant  information  about  it  in  a  hash  table.  At  recognition  time,  similar  invari¬ 
ants  are  computed  from  the  sensory  data,  and  are  used  to  index  into  the  hash  table 
to  find  possible  instances  of  the  model.  If  enough  of  the  sensor  invariants  score  a 
hit  when  hashing  against  the  model  table,  one  hais  in  principle  found  an  instance  of 
the  model  in  the  sensory  data. 
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The  formal  description  of  the  geometric  hashing  algorithm  is  for  noise-free  data. 
A  number  of  variations  of  the  basic  geometric  method  have  been  presented,  and  have 
been  illustrated  using  data  from  real  images  [16, 17,  18, 19, 22]  with  associated  sensor 
noise.  The  experimental  results  reported  in  these  papers,  however,  have  been  limited 
to  relatively  simple  scenes.  A  modified  version  of  the  geometric  hashing  method 
has  been  reported  in  [4],  where  uncertainty  in  the  image  measurements  is  explicitly 
taken  into  account  using  a  probabilistic  model.  This  method  addresses  the  issue  of 
inexact  sensor  data,  however  it  leaves  open  the  question  of  formal  characterizations 
of  the  expected  performance  of  the  method  in  the  presence  of  noise  and  clutter. 

Any  hashing  function  should  include  an  analysis  of  the  conditions  under  which 
collisions  will  occur;  when  will  different  data  items  be  mapped  to  the  same  key?  In 
this  section  we  provide  such  an  analysis  for  the  affine  hashing  method.  This  analysis 
is  particularly  crucial  in  the  case  of  the  affine  hashing  method,  because  the  hash  table 
is  also  implicitly  used  to  allow  for  small  amounts  of  uncertainty  in  the  sensor  data. 
That  is,  the  method  relies  on  the  fact  that  ‘similar’  sensor  values  will  be  hashed  to 
the  same  location.  As  we  have  seen  briefly  above,  however,  it  is  not  possible  to  use 
any  simple  tessellation  of  the  a-/3-space  in  order  to  correctly  account  for  uncertainty. 
In  particular,  the  range  of  (o,/3)  values  consistent  with  a  given  point  depends  on 
the  actual  configuration  of  the  model  and  image  points.  The  configuration  of  the 
image  points  is  not  available  at  the  time  that  the  hash  table  is  constructed,  and 
thus  a  strictly  correct  table  cannot  be  built.  In  practice,  implementations  of  the 
method  use  approximations  that  simply  tessellate  the  space  uniformly  and  ignore 
the  effects  that  this  has  both  on  false  matches  and  false  rejections. 

4.1  Details  of  the  Geometric  Hashing  Method 

As  with  most  model-based  recognition  methods,  it  is  assumed  that  an  object  can  be 
represented  by  a  collection  of  features,  or  interest  points.  A  ‘match’  of  a  model  to  a 
scene  consists  of  a  mapping  of  a  subset  of  the  model  features  to  a  subset  of  the  image 
features,  such  that  applying  a  geometric  transformation  of  some  particular  type  to 
all  of  the  model  features  will  make  each  of  them  coincident  with  their  corresponding 
image  feature.  The  geometric  hashing  approach  has  been  used  with  various  types  of 
transformations,  however  here  we  restrict  ourselves  to  the  case  of  a  two-dimensional 
affine  transformation. 

The  geometric  hashing  method  consists  of  two  basic  stages:  (i)  the  construction 
of  a  model  hash  table  aind  (ii)  the  matching  of  the  models  to  an  image.  The  hash 
table  is  used  to  store  a  redundant,  transformation-invariant  representation  of  each 
object.  This  representation  makes  use  of  the  fact  that  a  triple  of  model  points 
defines  an  invariant  coordinate  frame,  or  basis.  An  affine-invariant  model  of  an 
object  is  formed  by  expressing  the  locations  of  its  feature  points  in  terms  of  each 
such  transformation-invariant  coordinate  frame  (or  basis),  and  using  the  resulting 
coordinates  as  indices  for  storing  the  corresponding  basis  in  a  hash  table. 

A  key  assumption  underlying  the  method  is  what  we  will  term  the  affine  hashing 
hypothesis,  which  is  that  a  point  represented  in  terms  of  some  basis  will  produce 
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the  same  coordinate  values  under  any  valid  transformation  of  that  point  and  basis. 
This  can  be  stated  more  formally  as  follows. 

Assumption  1  Consider  the  four  (ordered)  ‘model’  points  mi,  m2,  m3,  and  m4, 
and  the  affine  invariant  coordinates  {oi,/3)  defined  by  m4  —  mi  =  a(m2  —  mi)  + 
/3(m3  —  mi).  Let  these  four  points  undergo  a  transformation,  T,  and  denote  the 
resulting  points  by  m(  =  Tm,, i  =  It  is  assumed  to  be  the  case  that  m^  - 

m'l  =  a(m^  -  m'l)  +  /3(m3  -  mi). 

When  the  transformation  T  is  an  affine  transformation,  then  this  assumption 
is  true  [16,  17,  18,  19,  22].  However  when  T  is  a  transformation  mapping  a  model 
to  its  image  using  a  camera  or  other  sensing  device,  there  will  generally  be  errors 
in  the  locations  of  the  image  points.  In  these  cases  the  affine  hashing  assumption 
no  longer  holds.  In  the  previous  sections  we  have  analyzed  the  extent  to  which 
uncertainty  (or  error)  in  the  locations  of  image  points  impacts  this  assumption,  and 
hence  affects  the  geometric  hashing  method.  In  the  following  section  we  use  this 
analysis  to  determine  the  probability  that  the  affine  hashing  method  will  falsely 
report  a  match  when  none  is  present. 

Before  analyzing  the  performance  of  the  method  in  the  presence  of  sensor  un¬ 
certainty,  we  describe  the  method  assuming  that  there  is  no  sensor  error  and  no 
numerical  roundoff  error. 

For  each  model,  the  following  steps  are  used  to  enter  it  into  the  hash  table: 

1.  Choose  an  ordered  set  of  three  model  points  mi,  m2, m3  as  a  basis,  formed 
by  an  origin 

o  =  mi 

and  a  pair  of  axes 

u  =  m2  -  mi 

V  =  m3  —  mi . 

2.  For  each  additional  model  point  m,,  rewrite  the  coordinates  of  the  vector 
m,  —  o  in  the  affine  basis  defined  by  the  axes  u,  v.  In  other  words,  find  the 
coordinates  a,fi  such  that 

m,  -  o  =  au  -I-  f3v. 

3.  Hash  into  a  table  using  the  indices  (a,/3),  and  store  at  that  point  in  the  table 
the  basis  triple  (o,  u,v). 

4.  Repeat  this  process  for  all  possible  choices  of  model  bases  (that  is  for  all 
ordered  triples  of  model  points).  This  results  in  a  table  indexed  by  affine- 
invariant  coordinates.  Any  pair  of  a  and  (3  values  can  be  used  to  retrieve 
those  model  bases  (if  any)  for  which  some  model  point  mj  has  the  affine- 
invariant  coordinates  (a,/3).  In  particular,  if  (a',/3')  are  affine  coordinates  for 
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an  image  point,  written  in  terms  of  some  image  basis,  then  (a',/?')  =  (a,/?)  if 
and  only  if  there  is  a  legal  transformation  of  the  four  model  points  (the  three 
basis  points  together  with  the  point  represented  by  the  affine  coordinates) 
that  maps  them  onto  the  four  associated  image  points. 

At  recognition  time,  the  hash  table  is  used  to  determine  which  models  are  present 
in  the  image.  The  idea  is  that  if  we  select  a  triple  of  image  points  that  corresponds 
to  the  model  and  compute  the  coordinates  for  other  image  features  in  terms  of  this 
basis,  the  hash  table  will  contain  corresponding  entries  because  the  model  is  stored 
in  the  table  in  terms  of  every  possible  basis  (and  the  representation  is  invariant 
under  any  affine  transformation).  Thus,  if  we  have  selected  an  image  basis  that 
corresponds  to  the  model,  all  the  remaining  image  points  that  correspond  to  the 
model  will  produce  (a,^)  pairs  that  specify  the  same  model  basis  in  the  hash  table. 

The  exact  processing  at  recognition  time  is  as  follows: 

1.  Choose  a  set  of  three  sensor  points  si, 82,83  to  form  a  basis,  formed  by  an 
origin 

O  =  Si 

and  a  pair  of  axes 


U  =  82  -  Si 

V  =  S3 -Si. 

2.  For  each  additional  sensor  point  s,,  rewrite  the  coordinates  of  the  vector  s,  - 
O  in  the  affine  basis  defined  by  the  axes  U,V.  In  other  words,  find  the 
coordinates  a',/3'  such  that 


Si -0  =  q'U  +  P'V. 


3.  Index  into  the  hash  table  using  the  indices  (q',/3'),  and  retrieve  the  set  of 
entries  at  that  point  in  the  table.  Any  bases  stored  at  that  location  are 
possible  candidate  matches.  Each  time  a  given  basis  is  retrieved  from  the 
table,  a  corresponding  counter  is  incremented  in  a  histogram.  This  step  is 
repeated  for  ail  additional  sensor  points. 

4.  Once  all  the  sensor  points  have  been  hashed,  the  histogram  contains  votes  for 
those  model  bases  that  could  correspond  to  the  current  sensor  basis,  (O,  U,  V). 
If  the  peak  in  the  histogram  for  a  given  model  basis,  (o,  u,v),  is  sufficiently 
high,  then  this  basis  is  selected  as  a  possible  match.  The  entire  model  can  then 
be  transformed  into  the  image  coordinates  and  compared  to  verify  that  the 
hypothesized  transformation  is  correct.  The  transformation  from  the  model 
to  the  image  coordinate  frame  can  be  computed  from  the  corresponding  model 
and  image  bases. 


26 


5.  The  entire  operation  is  repeated  for  all  possible  bases  (that  is  all  triples  of 
image  points  are  considered  until  a  match  is  found).  On  each  iteration,  the 
histogram  counts  are  cleared. 

Since  the  description  of  the  algorithm  is  for  perfect  data,  issues  concerning  the 
ellipse  of  uncertainty  associated  with  a  point  are  not  addressed.  In  extending  the 
method  to  deal  with  uncertainty,  Step  3  must  be  modified,  so  that  are 

used  to  compute  the  ellipse  of  feasible  values,  and  for  any  bucket  in  the  hash  table 
that  intersects  this  ellipse,  the  stored  entries  are  retrieve  and  used  to  increment 
the  histogram.  Presumably  only  one  vote  is  cast  for  a  model  basis  retrieved  in 
this  manner,  even  if  it  appears  in  more  than  one  bucket  overlapping  the  ellipse  of 
uncertainty.  Note,  however,  that  because  of  these  regions  of  uncertainty,  a  single 
model  feature  may  be  retrieved  by  more  than  one  image  feature,  an  issue  to  which 
we  will  return  shortly. 

5  The  Sensitivity  of  Geometric  Hashing  in  the  Pres¬ 
ence  of  Noise 

Given  that  we  can  estimate  ranges  of  values  for  the  affine  parameters  ft,/?,  we  can 
turn  to  the  use  of  such  ranges  in  examining  the  sensitivity  of  the  geometric  hashing 
method.  The  main  question  of  concern  is  whether  a  random  collection  of  sensor 
points  can  masquerade  as  a  correct  interpretation.  That  is,  under  what  conditions 
is  it  likely  that  some  random  set  of  sensor  points,  rewritten  with  respect  to  some 
arbitrarily  chosen  sensor  basis,  will  index  an  incorrect  model  basis  enough  times  to 
give  a  histogram  vote  as  large  as  the  correct  interpretation?  We  can  investigate  the 
probability  of  such  false  positive  identifications  with  the  following  plan  of  action. 
(Recall  that  we  analyze  the  case  in  which  each  point  is  correctly  represented  by 
an  ellipse  with  a  given  uncertainty  value,  whereas  the  actual  implementations  of 
geometric  hashing  do  not  use  this  correct  expression.) 

1.  We  use  the  analysis  from  Section  4  to  estimate  the  probability  that  a  given 
quadruple  of  image  points  will  match  a  given  quadruple  of  model  points,  given 
bounded  uncertainty  of  radius  c  in  the  sensor  data.  We  denote  this  by  the 
selectivity  Jl  as  given  by  equation  (19),  or  its  approximation  in  equation  (23). 

2.  Each  model  basis  is  stored  in  the  hash  table  according  to  the  m  —  3  remaining 
model  features,  and  thus  there  are  m— 3  points  that  index  to  each  model  basis. 
We  aje  interested  in  the  probability  that  a  randomly  chosen  image  point  and 
image  basis  will  hash  to  a  location  in  the  ft-/?-space  that  is  consistent  with 
a  given  model  basis.  This  is  just  the  probability  that  for  at  least  one  of  the 
m  -  3  model  points,  the  image  point  lies  within  the  error  disc  associated 
with  the  model  point,  rewritten  in  terms  of  the  image  basis.  If  we  assume 
independently  distributed  features,  then  this  is  Just 

p=  1-(1-/I)”‘-^  (24) 
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This  follows  from  the  fact  that  the  probability  that  a  particular  model  point 
is  not  consistent  with  a  given  pair  of  indices  is  (1  —  /I)  and  by  independence, 
the  probability  that  all  m  —  3  points  are  not  consistent  with  this  pair  of 
indices  is  (1  -  If  this  probability  is  reasonably  large,  then  there  is  a 

high  probability  that  we  will  see  a  large  number  of  votes  for  a  given  basis 
at  random.  Note  that  in  doing  this,  we  are  actually  underestimating  the 
probability  of  a  vote.  We  should  really  evaluate  the  expected  value  of 

1  -  (1  - 

rather  than  just  evaluating  the  expected  value  for  Jl  and  using  that  directly 
in  computing  p.  Doing  so  leads  to  a  more  complicated  expression  that  gives 
values  slightly  larger  than  those  obtained  using  the  above  expression.  For 
simplicity,  we  use  the  expression  in  equation  (24). 

3.  For  each  image  basis,  all  s  -  3  remaining  image  points  (other  than  the  3  points 
used  to  establish  the  basis)  are  used  to  form  an  index  to  lookup  corresponding 
bases  into  the  table.  If  the  probability  of  being  consistent  with  a  given  basis 
is  p,  then  ps  should  be  much  smaller  than  m  if  we  are  to  avoid  a  false  positive. 
More  precisely,  if  the  probability  that  a  single  hash  lookup  will  cast  a  vote  for 
a  particular  model  basis  is  p,  then  the  probability  of  exactly  k  votes  out  of 
s  -  3  is 

(25) 

Further,  the  probability  of  a  false  positive  identification  of  size  at  least  k  is 


it-i 

=  1  -  X)  9»  - 

i=0 

Note  that  this  is  the  probability  of  a  false  positive  for  a  particular  sensor  basis 
and  a  particular  model  basis. 

4.  Since  the  hash  table  is  built  by  considering  all  possible  model  bases,  there  are 
C^)  different  t>,'ses  entered  into  the  table.  The  probability  of  a  false  positive 
identification  for  a  given  sensor  basis  with  respect  to  one  model  basis  is  Wk- 
Hence,  the  probability  of  a  false  positive  for  this  given  sensor  bcisis  with  respect 
to  any  model  basis  is 

ejt  =  1  -  (1  -  «;fc)(3) .  (26) 


5.1  Testing  the  model 

To  check  the  correctness  of  our  model,  we  ran  a  series  of  experiments  based  on 
equation  25.  In  particular,  we  generated  random  sets  of  model  and  image  features, 
with  25  model  features  and  with  25,50,100  and  200  image  features.  We  used  our 
analysis  to  generate  a  predicted  distribution  for  the  probability  of  a  false  positive 
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identification  of  size  k.  In  each  case,  the  uncertainty  was  set  at  t  =  3,  and  the 
cutoff  on  angular  stability  was  0o  =  ^.  These  values,  together  with  data  about  the 
minimum  and  maximum  separation  of  model  points  were  used  to  generate  values 
for  /i,  which  were  typically  on  the  order  of  0.0011. 

For  comparison,  we  also  ran  some  simulations  on  these  data  sets,  by  selecting 
bases  for  both  the  model  and  the  image  at  random,  and  determining  the  size  of 
vote  associated  with  that  pairing  of  bases.  In  particular,  for  each  additional  model 
point,  we  computed  the  affine  coordinates  relative  to  the  chosen  basis,  then  used 
those  coordinates  to  determine  the  nominal  transformed  position  in  the  image.  We 
also  used  those  coordinates  to  determine  the  radius  of  the  associated  error  disc. 
For  each  image  point,  we  checked  to  see  if  at  least  one  of  the  error  discs  about  a 
transformed  model  point  contained  the  image  point.  If  so,  we  incremented  the  vote 
for  this  pairing  of  bases.  This  trial  was  repeated  1000  times.  We  excluded  choices 
of  model  bases  for  which  more  than  half  the  transformed  points  would  lie  outside 
the  extent  of  the  image.  The  results  of  these  trials  are  shown  in  Figures  (14)  and 
(15). 

One  can  see  that  the  cases  are  in  good  agreement.  In  fact,  our  model  tends  to 
overestimate  the  probability  of  small  false  positives,  and  underestimate  the  proba¬ 
bility  of  large  false  positives,  so  our  results  will  tend  to  be  conservative. 

Next,  we  turn  to  the  question  of  what  looks  like  (recall  that  this  is  the 
probability  that  any  sensor  basis  will  have  at  least  one  matching  model  basis).  As 
an  illustration,  we  graph  the  probability  of  a  false  positive  based  on  equation  (26). 
In  particular,  we  use  a  selectivity  based  on  e  =  3,<po  =  ■^,  obtained  from  Table  1, 
and  plot  the  value  of  e*  for  an  object  with  25  model  features,  for  different  values  of  k 
and  a  given  number  of  sensor  features  s.  This  is  graphed  in  Figure  (16).  (A  similar 
set  of  graphs,  for  m  =  38  and  m  =  50,  are  also  shown  in  Figure  (16).)  The  process 
was  repeated  for  different  values  for  the  number  of  sensor  features  s,  generating 
the  family  of  graphs  in  the  figure.  In  Figure  (17)  we  graph  the  same  probability 
of  a  false  positive  based  on  equation  (26),  here  using  a  selectivity  corresponding  to 
errors  of  e  =  5. 

In  Figure  (16)  the  correct  interpretation  cannot  account  for  more  than  22,35 
and  47  model  features,  respectively,  because  three  of  the  m  features  always  match 
(and  we  used  values  of  m  =  25,38,50).  Since  in  general  the  correct  interpretation  is 
likely  to  have  fewer  than  this  number  of  features  due  to  occlusion,  one  can  see  that 
the  probability  of  a  false  positive  is  “acceptable”  only  for  cases  with  a  moderate 
number  of  sensor  features,  and  limited  error.  If  the  error  bound  is  6  =  3,  then 
one  can  tolerate  ratios  of  sensory  data  to  model  features  as  large  as  10  ;  1  while 
expecting  with  probability  nearly  0  to  have  a  false  positive  peak  in  the  histogram 
as  big  as  the  model  itself,  for  each  choice  of  sensor  basis.  If  we  expect  half  of  the 
model  to  be  occluded  then  if  the  ratio  of  sensory  data  to  model  features  is  on  the 
order  of  5  :  1,  we  expect  with  probability  nearly  1  to  have  a  false  positive  peak  in 
the  histogram  at  least  as  big,  but  if  the  ratio  is  3  :  1,  the  probability  of  a  false  peak 
is  nearly  0. 
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Figure  14: 

Comparison  of  predicted  and  measured  probabilities  of  false  positives.  Each  graph  compares  the 
probability  of  a  false  peak  of  size  k  observed  at  random.  The  cases  are  for  m  =  25  and  s  =  25 
and  50,  from  top  to  bottom.  In  each  case,  t  =  3,  and  =  ■^.  The  graph  drawn  with  triangles 
indicates  the  predicted  probability,  while  the  graph  drawn  with  squares  indicates  the  observed 
empirical  probabilities. 
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Figure  15: 

Comparison  of  predicted  and  measured  probabilities  of  false  positives.  Each  graph  compares  the 
probability  of  a  false  peak  of  size  k  observed  at  random.  The  cases  are  for  m  =  25  and  s  =  100 
and  200,  from  top  to  bottom.  In  each  case,  f  =  3,  and  0o  =  ■^.  The  graph  drawn  with  triangles 
indicates  the  predicted  probability,  while  the  graph  drawn  with  squares  indicates  the  observed 
empirical  probabilities. 
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When  we  consider  errors  of  e  =  5  (Figure  17),  however,  much  less  clutter  can  be 
tolerated.  Now  only  ratios  of  sensor  data  to  model  features  on  the  order  of  4  :  1  can 
be  tolerated  while  ensuring  that  the  probability  of  a  false  peak  as  big  as  the  model 
is  nearly  0,  and  if  the  ratio  is  6  :  1,  this  probability  goes  to  nearly  1.  If  half  of  the 
model  is  occluded,  then  the  corresponding  ratios  reduce  to  1  :  1  and  2:1. 

In  Figure  ( 18),  we  show  the  false  positive  rate  as  the  error  rate  changes.  Each 
figure  plots  the  false  positive  rate,  for  model  features  m  =  25,  and  for  sensor  features 
varying  from  s  =  25  to  a  =  200  by  increments  of  25.  The  individual  plots  are  for 
varying  numbers  of  sensory  features,  and  the  process  is  repeated  for  changes  in  the 
bound  on  the  sensor  error,  given  a  fixed  threshold  on  angle  of  d>o  =  x/16.  One  can 
see  that  if  the  error  is  very  small,  the  method  performs  well,  i.e.  the  probability  of  a 
false  positive  rapidly  drops  to  zero  even  for  small  numbers  of  model  features.  As  the 
sensor  error  increases,  however,  the  probability  of  a  false  positive  rapidly  increases, 
as  can  be  seen  by  comparing  different  families  of  plots  in  Figure  (18).  Note  that 
the  best  possible  correct  solution  would  be  for  k  =  22. 

To  compare  our  analysis  with  real  data,  we  have  performed  the  following  test. 
Lamdan  et  al.  [16]  report  data  for  the  number  of  correct  and  incorrect  votes  for  a 
model  basis  in  the  histogram,  as  a  function  of  the  size  of  the  vote.  This  is  done  for 
an  image  with  28  features,  and  a  model  with  21  features.  Using  this  data,  we  can 
estimate  the  probability  of  a  false  positive,  for  this  image  and  model,  as  a  function 
of  the  size  of  the  vote.  This  is  graphed  in  Figure  (19),  (the  triangles).  We  can  also 
use  equation  (26)  to  predict  this  probability.  We  do  this  for  four  different  values 
for  the  selectivity  factor,  as  indicated.  One  can  see  that  while  the  graphs  do  not 
exactly  match,  due  to  the  assumptions  of  the  analysis,  the  predicted  probability  of 
a  false  positive  is  reasonably  close  to  that  observed  in  the  real  data  case. 

In  part  the  results  that  we  describe  above  are  overly  pessimistic,  baised  on  the 
error  model  of  c-bounded  sensory  uncertainty.  In  essence  this  model  assumes  that 
the  location  of  a  feature  within  this  error  disc  has  a  uniform  probability  for  all  posi¬ 
tions  within  the  disc.  Perhaps  a  more  realistic  model  would  be  to  let  the  probability 
drop  off  with  distance  from  the  center  of  the  disc,  e.g.  using  a  normal  distribution. 
This  is  similar  to  the  error  model  used  in  [4],  who  use  a  probabilistic  formulation  of 
positional  uncertainty.  We  can  model  this  effect  with  the  following.  Assume  that 
while  the  overall  bound  on  positional  error  is  e,  with  a  probability  i/,  the  deviation  of 
the  feature’s  position  is  c'.  Then  if  m  is  the  number  of  model  features,  the  expected 
size  of  the  correct  interpretation,  given  this  error  model,  is 

i/om  (27) 

where  o  is  the  fraction  of  the  model  expected  to  be  occluded  by  other  objects.  For 
example,  suppose  m  =  25,  o  =  .75  and  c  =  5.  Then  the  probability  of  a  false  positive 
of  size  19  (the  correct  interpretation)  is  one,  if  s  =  25  (see  Figure  18c).  On  the  other 
hand,  if  we  consider,  say,  i/  =  .9  for  c'  =  1,  then  we  move  from  Figure  18c  to  Figure 
18a.  Now  the  expected  size  of  the  correct  interpretation  is  17,  but  we  can  tolerate 
sensor  clutter  as  high  as  s  =  150  and  still  have  the  probability  of  a  false  positive  be 
vanishingly  small. 
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Figure  16: 

Graph  of  probability  of  false  positives.  Vertical  axis  is  probability  of  false  positive  of  size  k, 
horizontal  axis  is  k.  Each  graph  represents  a  different  number  of  sensor  features,  starting  with 
3  =  25  for  the  left  most  graph,  and  increasing  by  increments  of  25.  In  the  top  case,  the  model 
consisted  of  25  features,  in  the  middle  case,  38  features,  and  in  the  bottom  case,  50  features. 
Selectivity  was  based  on  f  =  3  error. 
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Figure  17: 

Graph  of  probability  of  false  positives.  Vertical  axis  is  probability  of  false  positive  of  size  fc, 
horizontal  axis  is  k.  Each  graph  represents  a  different  number  of  sensor  features,  starting  with 
s  =  25  for  the  left  most  graph,  and  increasing  by  increments  of  25.  In  the  top  case,  the  model 
consisted  of  25  features,  in  the  middle  case,  38  features,  and  in  the  bottom  case,  50  features. 
Selectivity  was  based  on  f  =  5  error. 


34 


Figure  18: 

Gr&ph  of  probability  of  false  positives.  Vertical  axis  is  probability  of  false  positive  of  size  k, 
horizontal  axis  is  k.  Each  graph  represents  a  different  number  of  sensor  features,  starting  with 
«  =  25  for  the  left  most  graph,  and  increasing  by  increments  of  25.  In  all  cases,  the  model  consisted 
of  25  features.  In  the  top  case,  the  sensor  error  was  e  =  1,  in  the  middle  case,  r  =  3,  and  in  the 
bottom  case,  r  =  5. 
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Figure  19; 

Graph  of  probability  of  false  positives  -  real  data.  Vertical  axis  is  probability  of  false  positive  of 
size  k,  horizontal  axis  is  k.  The  graph  with  the  triangles  is  based  on  the  data  reported  by  Lamdan 
et  al.  for  an  image  with  s  =  28  and  a  model  with  m  =  21.  The  other  three  graphs  (squares)  are 
the  predicted  probability  of  a  false  positive,  for  selectivities  based  on  t  =  3,  5, 7,  and  9  from  left  to 
right  respectively. 

In  practice,  the  implementations  of  geometric  hashing  are  using  such  an  error 
model.  By  tesselating  the  hash  table  they  are  approximating  an  error  region  of 
some  size  (this  is  not  exactly  correct,  since  a  square,  or  even  circular,  region  in 
Q-/3-space  maps  into  an  odd  shaped  region  in  image  space).  The  size  of  this  region 
is  generally  smaller  than  the  actual  bound  on  sensor  error.  This  may  cause  the  best 
interpretation  to  be  smaller  than  one  could  achieve  if  one  exactly  modeled  the  error 
effects,  but  at  the  same  time,  it  reduces  the  probability  of  seeing  false  positives. 

6  The  Sensitivity  of  Alignment  in  the  Presence  of  Noise 

A  second  object  recognition  method  based  on  affine  transformations  is  the  align¬ 
ment  method  [2,  12,  13].  The  initial  version  of  the  affine-invariant  alignment 
method  was  restricted  to  planar  objects  [12],  whereas  later  versions  operate  on 
three-dimensional  models  (unlike  affine  hashing  which  uses  two-dimensional  mod¬ 
els).  The  two-dimensional  version  of  the  alignment  method  bears  some  similarity 
to  the  geometric  hashing  approach,  but  differs  in  several  fundamental  aspects. 

The  basic  alignment  method  is  summarized  as  follows: 

•  Choose  an  ordered  triple  of  image  features  and  an  ordered  triple  of  model 
features,  and  hypothesize  that  these  are  in  correspondence. 

•  Use  this  correspondence  to  compute  an  affine  transformation  mapping  the 
model  into  the  image. 

•  Apply  this  transformation  to  all  of  the  remaining  model  features,  thereby 
mapping  them  into  the  image. 
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•  Search  over  an  appropriate  neighborhood  about  each  projected  model  feature 
for  a  matching  image  feature,  and  count  the  total  number  of  matched  features. 

This  operation  is  in  principle  repeated  for  each  ordered  triple  of  model  and  image 
features,  although  it  may  be  terminated  after  one  or  more  matches  are  found,  or 
after  a  certain  number  of  triples  are  tried  without  finding  a  match. 

The  two-dimensional  affine  transformation.  A,  consisting  of  a  linear  transforma¬ 
tion  L  and  a  translation  b,  is  computed  from  three  pairs  of  model  and  image  points 
{am,ai),  (b„i,b,)  and  (c„i,Ci),  using  the  following  procedure: 

a)  Translate  the  model  so  that  the  point  is  at  the  origin. 

b)  Define  the  translation  vector  b  =  — aj,  and  translate  the  image  points  so  that 
thf-  new  a,  is  at  the  origin,  the  new  b^  is  at  b,  -  a,  and  the  new  c,  is  at  Ci  -  a,. 

c)  Solve  for  the  linear  transformation 


given  by  the  two  pairs  of  equations  in  two  unknowns 

libm  —  b,'. 


and 

LCm  —  ^i- 

In  a  manner  very  similar  to  that  used  in  the  previous  section,  we  can  analyze 
the  sensitivity  of  the  alignment  method.  As  before,  the  main  question  of  concern 
is  whether  a  random  collection  of  sensor  points  can  masquerade  as  a  correct  in¬ 
terpretation.  In  this  case,  we  can  investigate  the  probability  of  such  false  positive 
identifications  with  the  following  plan  of  action. 

1.  As  before,  the  selectivity  of  a  given  quadruple  of  points  is  given  by  the  ex¬ 
pression  for  JI  in  equation  (19). 

2.  Since  each  model  point  is  projected  into  the  image,  the  probability  that  a 
given  model  point  matches  at  least  one  image  point  is 

p' = 

This  follows  from  the  fact  that  the  probability  that  a  particular  model  point 
is  not  consistent  with  a  particular  image  point  is  (1  —  /I)  and  by  independence, 
the  probability  that  all  s  -  3  points  are  not  consistent  with  this  model  point 
is  (1  - 
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3.  The  process  is  repeated  for  each  model  point,  so  the  probability  of  exactly  k 
of  them  having  a  match  is 

(28) 

Further,  the  probability  of  a  false  positive  identification  of  size  at  least  k  is 

=  1  -  Zi  9.'- 
«=o 

Note  that  this  is  the  probability  of  a  false  positive  for  a  particular  sensor  basis 
and  a  particular  model  basis. 

4.  This  process  can  be  repeated  for  all  choices  of  model  bases,  so  the  probability 
of  a  false  positive  identification  for  a  given  sensor  basis  with  respect  to  any 
model  basis  is 

=  1-(1-4)(3).  (29) 

6.1  Testing  the  model 

To  check  the  correctness  of  our  model,  we  have  run  a  series  of  experiments  based  on 
equation  28.  In  particular,  we  have  used  our  analysis  to  generate  a  distribution  for 
the  probability  of  a  false  positive  identification  of  size  k,  given  e  =  3  and  4>o  = 
and  using  a  model  with  25  features  and  images  with  25,50,100  and  200  features. 
For  comparison,  we  also  generated  a  set  of  model  and  image  points  of  the  same  size, 
selected  bases  for  each  at  random,  and  determined  the  size  of  vote  associated  with 
that  pairing  of  bases.  In  particular,  for  each  additional  model  point,  we  computed 
the  affine  coordinates  relative  to  the  chosen  basis,  then  used  those  coordinates  to 
determine  the  nominal  position  of  an  associated  image  point,  together  with  a  disc  of 
uncertainty  about  that  point.  We  simply  checked  to  see  if  at  least  one  image  point 
lay  within  a  model  point’s  error  disc.  If  so,  we  incremented  the  vote  for  this  pairing 
of  bases.  This  trial  was  repeated  1000  times.  The  results  are  shown  in  Figures  (20) 
and  (21). 

One  can  see  that  the  cases  are  in  good  agreement.  In  fact,  our  model  tends  to 
overestimate  the  probability  of  small  false  positives,  and  underestimate  the  proba¬ 
bility  of  large  false  positives,  so  our  results  will  tend  to  be  conservative. 

Next,  we  turn  to  the  question  of  what  looks  like.  As  an  illustration,  we 
graph  in  Figure  (22)  the  probability  of  a  false  positive  based  on  equation  (29).  In 
particular,  we  use  a  selectivity  based  on  c  =  3,  obtained  from  Table  1,  and  plot 
the  value  of  for  an  object  with  25  model  features,  for  different  values  of  k  and  a 
given  number  of  sensor  features  s.  This  is  graphed  in  Figure  (22).  (A  similar  set 
of  graphs,  for  m  =  38  and  m  =  50,  are  also  shown  in  Figure  (22).)  The  process 
was  repeated  for  different  values  for  the  number  of  sensor  features  s,  generating  the 
family  of  graphs  in  the  figure. 
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Figure  20: 

Comparison  of  predicted  and  measured  probabilities  of  false  positives.  Each  graph  compares  the 
probability  of  a  false  peak  of  size  k  observed  at  random.  The  cases  are  for  m  =  25  and  s  =  25  and 
50,  from  top  to  bottom.  In  each  case,  e  =  3. 
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Figure  21: 

Comparison  of  predicted  and  measured  probabilities  of  false  positives.  Each  graph  compares  the 
probability  of  a  false  peak  of  size  k  observed  at  random.  The  cases  are  for  m  =  25  and  s  =  100 
and  200,  from  top  to  bottom.  In  each  case,  c  =  3. 
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In  Figure  (23)  we  graph  the  same  probability  of  a  false  positive  based  on  equation 
(29),  here  using  a  selectivity  corresponding  to  errors  of  e  =  5. 

These  results  can  be  compared  to  the  graphs  for  the  geometric  hashing  method. 
Several  observations  are  in  order.  First,  the  false  positive  curves  are  generally 
more  favorable  in  the  alignment  case.  That  is,  the  probability  of  a  false  positive  is 
considerably  smaller  for  alignment  than  for  the  comparable  case  of  hashing.  This 
is  mostly  due  to  the  fact  that  hashing  is  based  on  testing  all  image  points  to  see 
if  there  is  a  matching  model  point,  while  alignment  uses  the  model  points  to  look 
for  matching  image  points.  Since  there  are  generally  many  more  image  points  than 
model  points,  alignment  is  likely  to  have  a  lower  rate  of  false  positives.  If  geometric 
hashing  were  to  keep  track  of  which  model  points  have  been  matched  to  an  image 
point,  a  comparable  performance  would  be  expected.  Second,  the  false  positive 
curves  for  the  alignment  approach  an  asymptotic  limit  of  a  step  function,  with 
cutoff  at  m,  while  geometric  hashing,  in  the  form  considered  here,  tends  to  shift 
the  curves  linearly  with  increasing  s.  Overall,  one  concludes  that  alignment  can 
tolerate  considerably  more  clutter  in  the  scene  than  hashing. 

In  Figure  (24),  we  show  the  false  positive  rate,  as  the  error  rate  changes.  Each 
figure  plots  the  false  positive  rate,  for  model  features  m  =  25,  and  for  sensor  features 
varying  from  s  =  25  to  s  =  200  by  increments  of  25.  The  individual  plots  are  for 
varying  numbers  of  sensory  features,  and  the  process  is  repeated  for  changes  in  the 
bound  on  the  sensor  error,  given  a  fixed  threshold  on  angle  of  (/>o  =  tt/IO.  One  can 
see  that  if  the  error  is  very  small,  the  method  performs  well,  i.e.  the  probability  of  a 
false  positive  rapidly  drops  to  zero  even  for  small  numbers  of  model  features.  As  the 
sensor  error  increases,  however,  the  probability  of  a  false  positive  rapidly  increases, 
as  can  be  seen  by  comparing  different  families  of  plots  in  Figure  (24).  Note  that 
the  best  possible  correct  solution  would  be  for  k  =  22. 

7  Relation  to  Previous  Work 

The  first  analysis  of  the  effects  of  sensor  uncertainty  on  affine  matching  was  done  in 
[11].  At  that  time,  we  did  not  have  the  precise  expression  for  error  bounds  given  in 
Proposition  1,  but  we  were  able  to  produce  approximations  to  the  range  of  values 
for  the  affine  coordinates  (a,/3).  We  used  these  in  numerical  simulations,  showing 
empirically  that  the  range  of  values  associated  with  a  pair  of  affine  coordinates  un¬ 
der  this  uncertainty  model  increased  with  increasing  t,  but  also  with  the  parameters 
associated  with  the  basis  vectors.  Although  the  article  only  illustrated  average  se¬ 
lectivity  values,  obtained  by  averaging  the  selectivity  ranges  over  all  possible  choices 
of  points,  our  data  supported  the  idea  that  selectivity  also  depended  on  the  specific 
point,  i.e.  on  the  actual  value  of  (a,/?).  The  conclusion  drawn  in  that  work  was  that 
the  verification  stage  of  the  recognition  process  would  be  critical  for  methods  like 
geometric  hashing,  since  a  large  number  of  possible  pairings  of  model  and  image 
basis  would  be  hypothesized.  We  also  concluded  that  in  order  to  keep  the  com¬ 
binatorics  manageable,  geometric  hashing,  like  generalized  Hough  transforms  [9], 
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Figure  22: 

Graph  of  probability  of  false  positives.  Vertical  axis  is  probability  of  false  positive  of  size  k, 
horizontal  axis  is  k.  Each  graph  represents  a  different  number  of  sensor  features,  starting  with 
s  =  25  for  the  left  most  graph,  and  increasing  by  increments  of  25.  In  the  top  case,  the  model 
consisted  of  25  features,  in  the  middle  case,  38  features,  and  in  the  bottom  case,  50  features. 
Selectivity  was  based  on  r  =  3  error. 
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Figure  23: 

Graph  of  probability  of  false  positives.  Vertical  axis  is  probability  of  false  positive  of  size  k, 
horizontal  axis  is  k.  Each  graph  represents  a  different  number  of  sensor  features,  starting  with 
s  =  25  for  the  left  most  graph,  and  increasing  by  increments  of  25.  In  the  top  case,  the  model 
consisted  of  25  features,  in  the  middle  case,  38  features,  and  in  the  bottom  case,  50  features. 
Selectivity  was  based  on  f  =  5  error. 
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Figure  24: 

Graph  of  probability  of  false  positives.  Vertical  axis  is  probability  of  false  positive  of  size  k, 
horizontal  axis  is  k.  Each  graph  represents  a  different  number  of  sensor  features,  starting  with 
5  =  25  for  the  left  most  graph,  and  increasing  by  increments  of  25.  In  all  cases,  the  model  consisted 
of  25  features.  In  the  top  case,  the  sensor  error  was  e  =  1,  in  the  middle  case,  c  =  3,  and  in  the 
bottom  case,  e  =  5. 
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and  tree  search  methods  [7,  8],  should  be  connected  to  a  good  grouping  or  selection 
method. 

In  a  more  recent  anaJysis  of  the  affine  hashing  method,  Lamdan  &  VVolfson  [20] 
(see  also  [22])  discuss  the  error  properties  of  affine  transformations  under  the  same 
error  model.  Their  approach  is  to  consider  the  equation 

Ax  =  d 


where  the  columns  of  matrix  A  are  defined  by  m2  —  mi  and  m3  -  mi  and  where 
d  =  p  —  mi,  with  p  representing  the  point  of  interest.  The  vector  x  defines  the 
affine  coordinates  of  p  and  is  obtained  by  inverting  the  matrix  equation.  To  account 
for  error,  Lamdan  &  Wolfson  consider 


(A  +  SA)  (x  +  ^x)  =  d  +  ^d. 


They  claim  that  the  values  of  the  entries  of  ^ A  and  ^d  are  bounded  by  c,  and  use 
this  in  their  analysis.  In  fact,  because  these  entries  are  defined  as  the  difference  of 
two  uncertain  vectors,  the  error  bounds  on  the  values  should  be  2e  which  suggests 
that  the  examples  given  in  [20]  are  actually  for  cases  in  which  the  sensor  error  is 
half  of  that  reported. 

Based  on  this  equation,  they  use  results  from  numerical  analysis  to  bound  the 
magnitude  of  the  uncertainty  in  the  affine  coordinates,  using: 


IM 

l|x|| 


<«(A) 


fMM 

IIAII 


IMl 

l|d|l 


+  0(c2) 


(30) 


where 

«(A)  =  ]|A||  IIA-MI 

is  the  condition  number  of  the  matrix  A. 

This  result  does  not  define  the  actual  range  of  values,  nor  the  shape  of  the 
region  of  uncertain  values  associated  with  an  affine  coordinate  pair,  but  it  does 
bound  the  magnitude  of  the  uncertainty,  in  a  manner  roughly  consistent  with  the 
results  presented  in  [11],  Note  that  the  magnitude  of  the  uncertainty  depends  on 
the  magnitude  of  the  actual  affine  coordinates  Ijxj],  as  well  as  on  the  magnitude  of 
the  point  ||d||  and  properties  of  the  affine  basis  ||A||.  Using  numerical  simulations 
(modulo  the  incorrect  values  for  c)  Lamdan  &  Wolfson  reach  essentially  the  same 
conclusion  as  that  of  [11],  namely  that  except  in  simple  cetses,  geometric  hashing 
without  verification  will  produce  far  too  many  fadse  hypotheses  to  be  used  as  a  pure 
recognition  technique.  They  note,  by  using  methods  from  [9],  that  in  the  case  of 
rigid  motions  and  simple  scenes,  verification  may  not  be  needed.  They  suggest  that 
geometric  hashing  does  form  a  useful  preprocessing  stage  for  subsequent  verification, 
by  reducing  the  number  of  cases  that  verification  must  consider. 
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8  Summary 


The  computation  of  an  affine-invariant  representation  in  terms  of  a  coordinate  frame 
(mi,  m2,  m3)  has  been  used  in  the  alignment  [12,  13]  and  geometric  hashing  [16,  17, 
18,  19,  22]  model-based  recognition  methods.  These  recognition  methods  were  both 
developed  assuming  no  uncertainty  in  the  sensory  data,  and  then  various  heuristics 
were  used  to  allow  for  error  in  the  locations  of  sensed  points.  In  this  paper  we  have 
formally  examined  the  effect  of  sensory  uncertainty  on  these  recognition  methods. 
This  analysis  involves  considering  both  the  Euclidean  plane  used  by  the  alignment 
method,  and  the  space  of  affine- invariant  (a,/?)  coordinates  used  by  the  geometric 
hashing  method.  Our  analysis  models  each  sensor  point  in  terms  a  disc  of  possible 
locations,  where  the  size  of  this  disc  is  bounded  by  an  uncertainty  factor,  e. 

Under  the  bounded  uncertainty  error  model,  in  the  Euclidean  space  the  set  of 
possible  values  for  a  given  point  x  and  a  basis  (mi,  m2,  m3)  forms  a  disc  whose 
radius  is  bounded  by  r  =  k€{l  |a|  -f  |/3|),  where  1  <  fc  <  2.  That  is,  assuming 
that  each  image  point  has  a  sensing  uncertainty  of  magnitude  e,  the  range  of  im¬ 
age  locations  that  are  consistent  with  x  forms  a  circular  region.  In  the  Q-/3-space, 
the  set  of  possible  values  of  the  affine  coordinates  of  a  point  x  in  terms  of  a  ba¬ 
sis  (mi,  m2,  m3)  forms  an  ellipse  (except  in  degenerate  cases).  The  area,  center 
and  orientation  of  this  ellipse  are  given  by  somewhat  complicated  expressions  that 
depend  on  the  actual  configuration  of  the  basis  points. 

The  most  important  consequence  of  our  analysis  is  the  fact  that  the  set  of  possi¬ 
ble  values  in  the  Q!-/3-plane  cannot  be  computed  independent  of  the  actual  locations 
of  the  model  or  the  image  basis  points.  This  means  that  the  table  constructed  by 
the  geometric  hashing  method  can  only  approximate  the  correct  values,  because 
the  locations  of  the  image  points  are  not  known  at  the  time  that  the  table  is  con¬ 
structed.  We  further  find  that  the  geometric  hashing  method  works  well  when  there 
is  little  noise  in  the  measurements,  and  when  the  amount  of  spurious  data  in  the 
scene  is  limited.  When  the  noise  levels  are  even  moderate,  however,  we  find  that 
the  method  degrades  considerably,  and  in  particular,  that  the  probability  of  a  false 
positive  recognition  becomes  significant.  This  probability  also  increases  rapidly  as 
a  function  of  the  number  of  sensory  features.  This  suggests  that  the  method  will 
require  that  a  substantial  number  of  hypothesized  matches  be  ruled  out  by  some 
subsequent  verification  stage.  In  analyzing  the  alignment  method,  we  find  that  the 
probability  of  a  false  match  is  substantially  lower  than  for  geometric  hashing,  largely 
because  the  alignment  method  explicitly  keeps  track  of  which  model  features  have 
been  matched  to  image  features. 
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A.  Determining  the  Area  of  the  Consistent  Region 


An  approximation  to  the  area  in  the  third  case,  using  the  underestimate  of  Figure  10 
with  p  =  pd  leads  to 
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Here  C2  =  min  {pruijli  so  that  we  integrate  out  until  either  we  hit  the  maximum 
value  for  p,  or  the  center  of  the  disc  reaches  the  edge  of  the  image.  Substitution, 
under  the  condition  that  pm  >  ^2  yields 
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dr*  +  2€^(r  -  d)  /  .  2€  ttN 

v/r*  -  4ci  V  r  2j 

d*r  +  2€*(d  -  r)  /  .  2c  xN 

Vd*  —  4c*  \  d  2  / 

+.(r-d)(logJ-log^) 

2c(d  +  r)  (3c(d  -  r)  —  rd  +  8c*) 


+ 


+27rc* 


(r  -  2c)(d  +  2c) 

,  ..  r*  +  d*  +  2c(d  -  r)  r  r  -  2c 


■  (32) 
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An  approximation  to  the  area  in  the  fourth  case,  using  the  approximation  shown 
in  Figure  11  yields: 


Aa  — 


f  k  {2e{  1  +  p)  -  pd  +  r)  -  (r  -  pdyp  ^  dp 

Jp=C2 

k  -  d  -  yja  +  bp +  cp'^ 

. 


;  arcsin 


(o(2<  +r)+  |(2e  -  d)) 


py/^ 


.  2cp  +  b 
arcsin 


V^J 


C3 


p=C2 


(33) 


where  C3  =  min{pm,  }• 

Substitution,  under  the  condition  that  pm  >  <^3  yields 


Aa  =  2k 


2({d  +  r) 


dr  +  ((d  -  r) 
dr 


8t^  +  +  f(r  -  2c)(d  -  r)  /x  .  2f 

A - ,  - - arcsin  — 

v/r2  -  4e2  V2  r 


+ 


—  d?r  —  ejd  +  2(){d  —  r)  f  x 
-  4e2 


(i- 

(1  + arcsin  I) 


(34) 
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